[jira] [Created] (HIVE-21520) Query "Submit plan" time reported is incorrect
Rajesh Balamohan created HIVE-21520: --- Summary: Query "Submit plan" time reported is incorrect Key: HIVE-21520 URL: https://issues.apache.org/jira/browse/HIVE-21520 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Hive master branch + LLAP {noformat} Query Execution Summary -- OPERATION DURATION -- Compile Query 0.00s Prepare Plan 0.00s Get Query Coordinator (AM) 0.00s Submit Plan 1553658149.89s Start DAG 0.53s Run DAG 0.43s -- {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21519) The meta data of table is not updated after setting a new data location
Boying Lu created HIVE-21519: Summary: The meta data of table is not updated after setting a new data location Key: HIVE-21519 URL: https://issues.apache.org/jira/browse/HIVE-21519 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 1.1.0 Reporter: Boying Lu re-produce steps: # create a new hive table T1 (without any partition) # persist some data into T1 # create a new hive table T2 (without any partition) # alter table T2 set location path-to-T1-HDFS-folder expected result: The metadata of T2 is changed accordingly actual resut: The metadata of T2 is not changed even after running the commands: analyze table T2 compute statistics (only the number of rows was updated) msck repair table T2 (no effect) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21518) GenericUDFOPNotEqualNS does not run in LLAP
Jason Dere created HIVE-21518: - Summary: GenericUDFOPNotEqualNS does not run in LLAP Key: HIVE-21518 URL: https://issues.apache.org/jira/browse/HIVE-21518 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-21518.1.patch GenericUDFOPNotEqualNS (Not equal nullsafe operator) does not run in LLAP mode, because it is not registered as a built-in function. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21517) Fix AggregateStatsCache
Miklos Gergely created HIVE-21517: - Summary: Fix AggregateStatsCache Key: HIVE-21517 URL: https://issues.apache.org/jira/browse/HIVE-21517 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.1 Reporter: Miklos Gergely Assignee: Miklos Gergely Fix For: 4.0.0 Due to a bug AggregateStatsCache is not returning the best matching result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21516) Fix spark downloading for q tests
Miklos Gergely created HIVE-21516: - Summary: Fix spark downloading for q tests Key: HIVE-21516 URL: https://issues.apache.org/jira/browse/HIVE-21516 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.1 Reporter: Miklos Gergely Assignee: Miklos Gergely Fix For: 4.0.0 Currently itests/pom.xml declares a command to generated the download script for spark, thus it is re-generated every time any maven command is executed for any sub project of itests. AS a side effect it is leaving download.sh files everywhere. The download.sh file is almost totally static, no need to recreate it every time, just requires $spark.version as a parameter. Also it is only working properly under linux, as it relies on the md5sum program which is not present in OS X. This means that if the spark tarball is partially downloaded on OS X, then it would never be re-downloaded. This should be fixed by making it work as well using md5 on OS X. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21515) Improvement to MoveTrash Facilities
David Mollitor created HIVE-21515: - Summary: Improvement to MoveTrash Facilities Key: HIVE-21515 URL: https://issues.apache.org/jira/browse/HIVE-21515 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-21515.1.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21514) Map data
Simon poortman created HIVE-21514: - Summary: Map data Key: HIVE-21514 URL: https://issues.apache.org/jira/browse/HIVE-21514 Project: Hive Issue Type: Bug Reporter: Simon poortman Fix For: 0.10.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21513) ACID: Running merge concurrently with minor compaction causes a later select * to throw exception
Vaibhav Gumashta created HIVE-21513: --- Summary: ACID: Running merge concurrently with minor compaction causes a later select * to throw exception Key: HIVE-21513 URL: https://issues.apache.org/jira/browse/HIVE-21513 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.1.1 Reporter: Vaibhav Gumashta Repro steps: - Create table - Load some data - Run merge so records gets updated and delete_delta dirs are created - Manually initiate minor compaction: ALTER TABLE ... COMPACT 'minor'; - While the compaction is running keep executing the merge statement - After some time try to do simple select *; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269262756 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: This replWriteId is just a place holder for the writeId from the event message. It need not be in CreateTableDesc. Can be maintained in local variables and pass around. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269220469 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: In replication flow, it is fine to use hardcoded ValidWriteIdList as we want to forcefully set this writeId into table or partition objects. Getting it from current state might be wrong as we don't update ValidTxnList in conf for repl created txns. ValidWriteIdList is just used to check if writeId in metastore objects are updated by any concurrent inserts. In repl load flow, it is not possible as we replicate one event at a time or in bootstrap, no 2 threads writes into same table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269136269 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java ## @@ -1247,17 +1244,37 @@ private static void createReplImportTasks( } else if (!replicationSpec.isMetadataOnly() && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) { x.getLOG().debug("adding dependent CopyWork/MoveWork for table"); -t.addDependentTask(loadTable(fromURI, table, replicationSpec.isReplace(), -new Path(tblDesc.getLocation()), replicationSpec, x, writeId, stmtId)); +dependentTasks = new ArrayList<>(1); +dependentTasks.add(loadTable(fromURI, table, replicationSpec.isReplace(), + new Path(tblDesc.getLocation()), replicationSpec, + x, writeId, stmtId)); } - if (dropTblTask != null) { -// Drop first and then create -dropTblTask.addDependentTask(t); -x.getTasks().add(dropTblTask); + // During replication, by the time we reply a commit transaction event, the table should + // have been already created when replaying previous events. So no need to create table + // again. For some reason we need create table task for partitioned table though. Review comment: The comment says for partitioned table, create table task needed but in the code it is skipped always for commit txn event. Which one is correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269098036 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { +isTxn = false; + } else { +isTxn = true; + } +} +// TODO: Somehow we have to signal alterPartitions that it's part of replication and +// should use replication's valid writeid list instead of creating one. Review comment: What do you mean by replication's valid writeid list in this comment? Even in repl flow, we get validWriteIdList from HMS based on incoming writeId in the event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use it instead of invoking HMS API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269060256 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java ## @@ -118,7 +118,8 @@ List notNullConstraints; List defaultConstraints; List checkConstraints; - private ColumnStatistics colStats; + private ColumnStatistics colStats; // For the sake of replication + private long writeId = -1; // For the sake of replication Review comment: Can we re-use the replWriteId variable that we already have? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269110947 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, List newParts, new ArrayList(); try { AcidUtils.TableSnapshot tableSnapshot = null; + // TODO: In case of replication use the writeId and valid write id list constructed for Review comment: Is it done or still TODO? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269247183 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean parallelBootstrap, boolean metad } @Test - public void testForNonAcidTables() throws Throwable { + public void testNonParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); testStatsReplicationCommon(false, false); } @Test - public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable { -testStatsReplicationCommon(true, false); + public void testForParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); +testStatsReplicationCommon(true, false ); } @Test - public void testNonAcidMetadataOnlyDump() throws Throwable { + public void testMetadataOnlyDump() throws Throwable { Review comment: Add more tests for the following scenarios. 1. REPL LOAD fails after replicating table or partition objects with stats but before setting last replId. Now, retry which takes alter table/partition replace flows and stats should be valid after successful replication. Need this for all non-transactional, transactional and migration cases. 2. Parallel inserts with autogather enabled. Now, we will have events such that multiple txns open when updating stats event. Also, try to simulate that one stats update was successful and the other one invalidates it due to concurrent writes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269223302 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && Review comment: addPartitionDesc.getReplicationSpec() will never be null. Can remove this check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269161871 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = Review comment: Shall use meaningful names instead of "vwil". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269257547 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists, tTbl.setPrivileges(principalPrivs); } } - // Set table snapshot to api.Table to make it persistent. - TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); - if (tableSnapshot != null) { -tbl.getTTable().setWriteId(tableSnapshot.getWriteId()); + // Set table snapshot to api.Table to make it persistent. A transactional table being + // replicated may have a valid write Id copied from the source. Use that instead of + // crafting one on the replica. + if (tTbl.getWriteId() <= 0) { Review comment: DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else in autogather mode at target, it will be updated automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269172695 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) { } // Update partition column statistics if available -for (Partition newPart : newParts) { - if (newPart.isSetColStats()) { -updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, newPart.getWriteId()); +int cnt = 0; +for (ColumnStatistics partColStats: partsColStats) { + long writeId = partsWriteIds.get(cnt++); + // On replica craft a valid snapshot out of the writeId in the partition + String validWriteIds = null; + if (writeId > 0) { +ValidWriteIdList vwil = Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269156935 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: Shall simplify the comment. "Column stats are not expected to be part of Create table event and also shouldn't be persisted. So remove it from Table object." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269169210 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = + new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Shall add a comment on why the hardcoded validWriteList is used in this flow instead of taking current state of txns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269154738 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: The complete logic of considering all txns opened in a batch by open txn event as current txns is incorrect. Multiple txns are opened by repl task only for replicating Hive Streaming case where we allocate txns batch but use one at a time. Also, we don't update stats in that case. Even if we update stats, it should refer to one txn as current txn and rest of the txns are left open. Shall remove replTxnIds cache in TxnManager as well. All callers shall create a hardcoded ValidWriteIdList using the writeId received from event msg. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269081532 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which is not transactional. This check doesn't seems to be clean as in future we might add more such alters in repl flow. Can we check replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to skip this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269103325 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java ## @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath, // Right now, we do not have a way of associating a writeId with statistics for a table // converted to a transactional table if it was non-transactional on the source. So, do not Review comment: Comment needs to be corrected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (HIVE-21512) Upgrade jms-api to 2.0.2
Zoltan Haindrich created HIVE-21512: --- Summary: Upgrade jms-api to 2.0.2 Key: HIVE-21512 URL: https://issues.apache.org/jira/browse/HIVE-21512 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich I've noticed that for some time that sometimes there are issues with javax.jms:jms:1.1 artifact - because it doesn't seem to be available from maven central for some reason; https://issues.sonatype.org/browse/MVNCENTRAL-4708 Alternatively; I think we might try to just upgrade to 2.0.2 version of the jms-api. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21511) beeline -f report no such file if file is not on local fs
Bruno Pusztahazi created HIVE-21511: --- Summary: beeline -f report no such file if file is not on local fs Key: HIVE-21511 URL: https://issues.apache.org/jira/browse/HIVE-21511 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.3.0, beeline-cli-branch Environment: java version: 1.8.0_112-b15 hadoop version: 2.7.2 hive version:1.3.0 hive JDBS version: 1.3.0 beeline version: 1.3.0 Reporter: Bruno Pusztahazi Assignee: Bruno Pusztahazi Fix For: 1.3.0 I test like this HQL=hdfs://hacluster/tmp/ff.hql if hadoop fs -test -f ${HQL} then beeline -f ${HQL} fi test ${HQL} ok, but beeline report ${HQL} no such file or directory -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21510) Vectorization: add support for and/or for (constant,column) cases
Zoltan Haindrich created HIVE-21510: --- Summary: Vectorization: add support for and/or for (constant,column) cases Key: HIVE-21510 URL: https://issues.apache.org/jira/browse/HIVE-21510 Project: Hive Issue Type: Improvement Reporter: Zoltan Haindrich After HIVE-21001 some selectExpressions will start using VectorUDFAdaptor for "null and x" expressions. Because right now there are 2-3 places from which rewrite will be done to the form of "null and/or x" form; it would be better to support it. {code} [...] selectExpressions: VectorUDFAdaptor((null and dt1 is null)) [...] usesVectorUDFAdaptor: true [...] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result
Adam Szita created HIVE-21509: - Summary: LLAP may cache corrupted column vectors and return wrong query result Key: HIVE-21509 URL: https://issues.apache.org/jira/browse/HIVE-21509 Project: Hive Issue Type: Bug Components: llap Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21508) ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
Adar Dembo created HIVE-21508: - Summary: ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer Key: HIVE-21508 URL: https://issues.apache.org/jira/browse/HIVE-21508 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 2.3.4, 3.2.0 Reporter: Adar Dembo There's this block of code in {{HiveMetaStoreClient:resolveUris}} (called from the constructor) on master: {noformat} private URI metastoreUris[]; ... if (MetastoreConf.getVar(conf, ConfVars.THRIFT_URI_SELECTION).equalsIgnoreCase("RANDOM")) { List uriList = Arrays.asList(metastoreUris); Collections.shuffle(uriList); metastoreUris = (URI[]) uriList.toArray(); } {noformat} The cast to {{URI[]}} throws a {{ClassCastException}} beginning with JDK 10, possibly with JDK 9 as well. Note that {{THRIFT_URI_SELECTION}} defaults to {{RANDOM}} so this should affect anyone who creates a {{HiveMetaStoreClient}}. On master this can be overridden with {{SEQUENTIAL}} to avoid the broken case; I'm working against 2.3.4 where there's no such workaround. [Here's|https://stackoverflow.com/questions/51372788/array-cast-java-8-vs-java-9] a StackOverflow post that explains the issue in more detail. Interestingly, the author described the issue in the context of the HMS; not sure why there was no follow up with a Hive bug report. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21507) Hive swallows NPE if no delegation token found
Denes Bodo created HIVE-21507: - Summary: Hive swallows NPE if no delegation token found Key: HIVE-21507 URL: https://issues.apache.org/jira/browse/HIVE-21507 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 3.1.1 Reporter: Denes Bodo Assignee: Denes Bodo In case if there is no delegation token put into token file, this [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777] will cause a NullPointerException which is not handled and the user is not notified in any way. To cause NPE the use case is to have an Oozie Sqoop import to Hive in a kerberized cluster. Oozie puts the delegation token into the token file with id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is not working. However, fallback code uses the key which Oozie provides [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784] way. I suggest to have warning message to user that key with id *hive* cannot be used and falling back to get delegation token from the session. I am creating the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21506) Memory based TxnHandler implementation
Peter Vary created HIVE-21506: - Summary: Memory based TxnHandler implementation Key: HIVE-21506 URL: https://issues.apache.org/jira/browse/HIVE-21506 Project: Hive Issue Type: New Feature Components: Transactions Reporter: Peter Vary The current TxnHandler implementations are using the backend RDBMS to store every Hive lock and transaction data, so multiple TxnHandler instances can run simultaneously and can serve requests. The continuous communication/locking done on the RDBMS side puts serious load on the backend databases also restricts the possible throughput. If it is possible to have only a single active TxnHandler (with the current design HMS) instance then we can provide much better (using only java based locking) performance. We still have to store the committed write transactions to the RDBMS (or later some other persistent storage), but other lock and transaction operations could remain memory only. The most important drawbacks with this solution is that we definitely lose scalability when one instance of TxnHandler is no longer able to serve the requests (see NameNode), and fault tolerance in the sense that the ongoing transactions should be terminated when the TxnHandler is failed. If this drawbacks are acceptable in certain situations the we can provide better throughput for the users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)