[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269880708 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: Yes, even I think, for REPL LOAD, we should always hardcode the ValidWriteIdList using current writeId so that stats are always valid while applying current event. Even if it is invalid, the subsequent alterTable/partition event would set it so in the table/partition parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269880083 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: Even that logic to create ValidWriteIdList based on all repl opened txns isn't right as it says that stats are valid for all these open txns but it isn't. Also, it sets high water mark based on 0th index in the replTxnsList map which might be pointing to wrong writeId compared to current txn's writeId. So, I doubt, this logic should be anyways removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269861070 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: If we unify writeId and replWriteId in CreateTableDesc into one, then it's fine. In fact, they are one and the same. So, no point in having 2 members for same value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269861070 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: If we unify writeId and replWriteId in CreateTableDesc, then it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269844369 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: I don't know. I would suggest to keep it non-transactional only in repl flow to avoid any impacts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269843944 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: I think, it's fine. Ignore this comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269843519 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: I meant, prepareImport already takes writeId (which comes from event message) as input parameter which is being set in CreateTableDesc and later read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to getBaseAddPartitionDescFromPartition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269257547 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists, tTbl.setPrivileges(principalPrivs); } } - // Set table snapshot to api.Table to make it persistent. - TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); - if (tableSnapshot != null) { -tbl.getTTable().setWriteId(tableSnapshot.getWriteId()); + // Set table snapshot to api.Table to make it persistent. A transactional table being + // replicated may have a valid write Id copied from the source. Use that instead of + // crafting one on the replica. + if (tTbl.getWriteId() <= 0) { Review comment: DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else in autogather mode at target, it will be updated automatically. Not sure if it is needed as table itself is not there in metastore. Anyways, please check if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269843519 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: I meant, prepareImport already takes writeId as input parameter which is being set in CreateTableDesc and later read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to getBaseAddPartitionDescFromPartition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269262756 ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java ## @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) { throw new RuntimeException("Invalid table type : " + getDescType()); } } + + public Long getReplWriteId() { +if (this.createTblDesc != null) { + return this.createTblDesc.getReplWriteId(); Review comment: This replWriteId is just a place holder for the writeId from the event message. It need not be in CreateTableDesc. Can be maintained in local variables and pass around. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269220469 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && +addPartitionDesc.getReplicationSpec().isInReplicationScope() && +addPartitionDesc.getPartition(0).getWriteId() > 0) { + writeId = addPartitionDesc.getPartition(0).getWriteId(); + validWriteIdList = Review comment: In replication flow, it is fine to use hardcoded ValidWriteIdList as we want to forcefully set this writeId into table or partition objects. Getting it from current state might be wrong as we don't update ValidTxnList in conf for repl created txns. ValidWriteIdList is just used to check if writeId in metastore objects are updated by any concurrent inserts. In repl load flow, it is not possible as we replicate one event at a time or in bootstrap, no 2 threads writes into same table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269136269 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java ## @@ -1247,17 +1244,37 @@ private static void createReplImportTasks( } else if (!replicationSpec.isMetadataOnly() && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) { x.getLOG().debug("adding dependent CopyWork/MoveWork for table"); -t.addDependentTask(loadTable(fromURI, table, replicationSpec.isReplace(), -new Path(tblDesc.getLocation()), replicationSpec, x, writeId, stmtId)); +dependentTasks = new ArrayList<>(1); +dependentTasks.add(loadTable(fromURI, table, replicationSpec.isReplace(), + new Path(tblDesc.getLocation()), replicationSpec, + x, writeId, stmtId)); } - if (dropTblTask != null) { -// Drop first and then create -dropTblTask.addDependentTask(t); -x.getTasks().add(dropTblTask); + // During replication, by the time we reply a commit transaction event, the table should + // have been already created when replaying previous events. So no need to create table + // again. For some reason we need create table task for partitioned table though. Review comment: The comment says for partitioned table, create table task needed but in the code it is skipped always for commit txn event. Which one is correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269098036 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { +isTxn = false; + } else { +isTxn = true; + } +} +// TODO: Somehow we have to signal alterPartitions that it's part of replication and +// should use replication's valid writeid list instead of creating one. Review comment: What do you mean by replication's valid writeid list in this comment? Even in repl flow, we get validWriteIdList from HMS based on incoming writeId in the event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use it instead of invoking HMS API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269060256 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java ## @@ -118,7 +118,8 @@ List notNullConstraints; List defaultConstraints; List checkConstraints; - private ColumnStatistics colStats; + private ColumnStatistics colStats; // For the sake of replication + private long writeId = -1; // For the sake of replication Review comment: Can we re-use the replWriteId variable that we already have? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269110947 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, List newParts, new ArrayList(); try { AcidUtils.TableSnapshot tableSnapshot = null; + // TODO: In case of replication use the writeId and valid write id list constructed for Review comment: Is it done or still TODO? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269247183 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java ## @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean parallelBootstrap, boolean metad } @Test - public void testForNonAcidTables() throws Throwable { + public void testNonParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); testStatsReplicationCommon(false, false); } @Test - public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable { -testStatsReplicationCommon(true, false); + public void testForParallelBootstrapLoad() throws Throwable { +LOG.info("Testing " + testName.getClass().getName() + "." + testName.getMethodName()); +testStatsReplicationCommon(true, false ); } @Test - public void testNonAcidMetadataOnlyDump() throws Throwable { + public void testMetadataOnlyDump() throws Throwable { Review comment: Add more tests for the following scenarios. 1. REPL LOAD fails after replicating table or partition objects with stats but before setting last replId. Now, retry which takes alter table/partition replace flows and stats should be valid after successful replication. Need this for all non-transactional, transactional and migration cases. 2. Parallel inserts with autogather enabled. Now, we will have events such that multiple txns open when updating stats event. Also, try to simulate that one stats update was successful and the other one invalidates it due to concurrent writes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269223302 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, Map partSpec) throws int size = addPartitionDesc.getPartitionCount(); List in = new ArrayList(size); -AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); long writeId; String validWriteIdList; -if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) { - writeId = tableSnapshot.getWriteId(); - validWriteIdList = tableSnapshot.getValidWriteIdList(); + +// In case of replication, get the writeId from the source and use valid write Id list +// for replication. +if (addPartitionDesc.getReplicationSpec() != null && Review comment: addPartitionDesc.getReplicationSpec() will never be null. Can remove this check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269161871 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = Review comment: Shall use meaningful names instead of "vwil". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269257547 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists, tTbl.setPrivileges(principalPrivs); } } - // Set table snapshot to api.Table to make it persistent. - TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, true); - if (tableSnapshot != null) { -tbl.getTTable().setWriteId(tableSnapshot.getWriteId()); + // Set table snapshot to api.Table to make it persistent. A transactional table being + // replicated may have a valid write Id copied from the source. Use that instead of + // crafting one on the replica. + if (tTbl.getWriteId() <= 0) { Review comment: DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else in autogather mode at target, it will be updated automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269172695 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) { } // Update partition column statistics if available -for (Partition newPart : newParts) { - if (newPart.isSetColStats()) { -updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, newPart.getWriteId()); +int cnt = 0; +for (ColumnStatistics partColStats: partsColStats) { + long writeId = partsWriteIds.get(cnt++); + // On replica craft a valid snapshot out of the writeId in the partition + String validWriteIds = null; + if (writeId > 0) { +ValidWriteIdList vwil = Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269156935 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final Table tbl, List checkConstraints) throws AlreadyExistsException, MetaException, InvalidObjectException, NoSuchObjectException, InvalidInputException { + + ColumnStatistics colStats = null; + // If the given table has column statistics, save it here. We will update it later. + // We don't want it to be part of the Table object being created, lest the create table Review comment: Shall simplify the comment. "Column stats are not expected to be part of Create table event and also shouldn't be persisted. So remove it from Table object." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269169210 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, final Table tbl, // If the table has column statistics, update it into the metastore. This feature is used // by replication to replicate table level statistics. - if (tbl.isSetColStats()) { -// We do not replicate statistics for a transactional table right now and hence we do not -// expect a transactional table to have column statistics here. So passing null -// validWriteIds is fine for now. -updateTableColumnStatsInternal(tbl.getColStats(), null, tbl.getWriteId()); + if (colStats != null) { +// On replica craft a valid snapshot out of the writeId in the table. +long writeId = tbl.getWriteId(); +String validWriteIds = null; +if (writeId > 0) { + ValidWriteIdList vwil = + new ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(), Review comment: Shall add a comment on why the hardcoded validWriteList is used in this flow instead of taking current state of txns. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269154738 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java ## @@ -84,6 +86,73 @@ public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, long return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, minOpenTxnId); } + /** + * Transform a {@link org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a + * {@link org.apache.hadoop.hive.common.ValidTxnList}. This assumes that the caller intends to + * read the files, and thus treats both open and aborted transactions as invalid. + * + * This API is used by Hive replication which may have multiple transactions open at a time. + * + * @param txns open txn list from the metastore + * @param currentTxns Current transactions that the replication has opened. If any of the + *transactions is greater than 0 it will be removed from the exceptions + *list so that the replication sees its own transaction as valid. + * @return a valid txn list. + */ + public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns, Review comment: The complete logic of considering all txns opened in a batch by open txn event as current txns is incorrect. Multiple txns are opened by repl task only for replicating Hive Streaming case where we allocate txns batch but use one at a time. Also, we don't update stats in that case. Even if we update stats, it should refer to one txn as current txn and rest of the txns are left open. Shall remove replTxnIds cache in TxnManager as well. All callers shall create a hardcoded ValidWriteIdList using the writeId received from event msg. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269081532 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ## @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc alterTbl) throws HiveException { } else { // Note: this is necessary for UPDATE_STATISTICS command, that operates via ADDPROPS (why?). // For any other updates, we don't want to do txn check on partitions when altering table. -boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS; +boolean isTxn = false; +if (alterTbl.getPartSpec() != null && alterTbl.getOp() == AlterTableTypes.ADDPROPS) { + // ADDPROPS is used to add repl.last.id during replication. That's not a transactional + // change. + Map props = alterTbl.getProps(); + if (props.size() <= 1 && props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) { Review comment: ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which is not transactional. This check doesn't seems to be clean as in future we might add more such alters in repl flow. Can we check replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to skip this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables. URL: https://github.com/apache/hive/pull/579#discussion_r269103325 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java ## @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath, // Right now, we do not have a way of associating a writeId with statistics for a table // converted to a transactional table if it was non-transactional on the source. So, do not Review comment: Comment needs to be corrected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services