[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269880708
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   Yes, even I think, for REPL LOAD, we should always hardcode the 
ValidWriteIdList using current writeId so that stats are always valid while 
applying current event. Even if it is invalid, the subsequent 
alterTable/partition event would set it so in the table/partition parameters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269880083
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   Even that logic to create ValidWriteIdList based on all repl opened txns 
isn't right as it says that stats are valid for all these open txns but it 
isn't. Also, it sets high water mark based on 0th index in the replTxnsList map 
which might be pointing to wrong writeId compared to current txn's writeId. So, 
I doubt, this logic should be anyways removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269861070
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   If we unify writeId and replWriteId in CreateTableDesc into one, then it's 
fine. In fact, they are one and the same. So, no point in having 2 members for 
same value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269861070
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   If we unify writeId and replWriteId in CreateTableDesc, then it's fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269844369
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   I don't know. I would suggest to keep it non-transactional only in repl flow 
to avoid any impacts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843944
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   I think, it's fine. Ignore this comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843519
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   I meant, prepareImport already takes writeId (which comes from event 
message) as input parameter which is being set in CreateTableDesc and later 
read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can 
be passed to getBaseAddPartitionDescFromPartition.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically. Not sure if it 
is needed as table itself is not there in metastore. Anyways, please check if 
needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843519
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   I meant, prepareImport already takes writeId as input parameter which is 
being set in CreateTableDesc and later read back by 
getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to 
getBaseAddPartitionDescFromPartition.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269262756
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   This replWriteId is just a place holder for the writeId from the event 
message. It need not be in CreateTableDesc. Can be maintained in local 
variables and pass around.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269220469
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   In replication flow, it is fine to use hardcoded ValidWriteIdList as we want 
to forcefully set this writeId into table or partition objects. Getting it from 
current state might be wrong as we don't update ValidTxnList in conf for repl 
created txns. 
   ValidWriteIdList is just used to check if writeId in metastore objects are 
updated by any concurrent inserts. In repl load flow, it is not possible as we 
replicate one event at a time or in bootstrap, no 2 threads writes into same 
table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269136269
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   The comment says for partitioned table, create table task needed but in the 
code it is skipped always for commit txn event. Which one is correct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269098036
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   What do you mean by replication's valid writeid list in this comment? Even 
in repl flow, we get validWriteIdList from HMS based on incoming writeId in the 
event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use 
it instead of invoking HMS API?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269060256
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   Can we re-use the replWriteId variable that we already have?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269110947
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   Is it done or still TODO?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269247183
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean 
parallelBootstrap, boolean metad
   }
 
   @Test
-  public void testForNonAcidTables() throws Throwable {
+  public void testNonParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
 testStatsReplicationCommon(false, false);
   }
 
   @Test
-  public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable {
-testStatsReplicationCommon(true, false);
+  public void testForParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
+testStatsReplicationCommon(true, false );
   }
 
   @Test
-  public void testNonAcidMetadataOnlyDump() throws Throwable {
+  public void testMetadataOnlyDump() throws Throwable {
 
 Review comment:
   Add more tests for the following scenarios.
   1. REPL LOAD fails after replicating table or partition objects with stats 
but before setting last replId. Now, retry which takes alter table/partition 
replace flows and stats should be valid after successful replication. Need this 
for all non-transactional, transactional and migration cases.
   2. Parallel inserts with autogather enabled. Now, we will have events such 
that multiple txns open when updating stats event. Also, try to simulate that 
one stats update was successful and the other one invalidates it due to 
concurrent writes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269223302
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   addPartitionDesc.getReplicationSpec() will never be null. Can remove this 
check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269161871
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Shall use meaningful names instead of "vwil".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269172695
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269156935
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   Shall simplify the comment. "Column stats are not expected to be part of 
Create table event and also shouldn't be persisted. So remove it from Table 
object."
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269169210
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Shall add a comment on why the hardcoded validWriteList is used in this flow 
instead of taking current state of txns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269154738
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   The complete logic of considering all txns opened in a batch by open txn 
event as current txns is incorrect. 
   Multiple txns are opened by repl task only for replicating Hive Streaming 
case where we allocate txns batch but use one at a time. Also, we don't update 
stats in that case. Even if we update stats, it should refer to one txn as 
current txn and rest of the txns are left open. 
   Shall remove replTxnIds cache in TxnManager as well. All callers shall 
create a hardcoded ValidWriteIdList using the writeId received from event msg.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269081532
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which 
is not transactional. This check doesn't seems to be clean as in future we 
might add more such alters in repl flow. Can we check 
replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to 
skip this?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox
sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269103325
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Comment needs to be corrected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services