from:"GitBox"

[GitHub] [hive] shawnweeks commented on issue #575: HIVE-21409 Add Jars to Session Conf ClassLoader

2019-03-20 Thread GitBox

shawnweeks commented on issue #575: HIVE-21409 Add Jars to Session Conf 
ClassLoader
URL: https://github.com/apache/hive/pull/575#issuecomment-475092978
 
 
   I suspect I should be doing the same thing for unregisterJar as it has the 
risk of overwriting the classloader in SessionState.get().getConf() as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] shawnweeks opened a new pull request #575: HIVE-21409 Add Jars to Session Conf ClassLoader

2019-03-20 Thread GitBox

shawnweeks opened a new pull request #575: HIVE-21409 Add Jars to Session Conf 
ClassLoader
URL: https://github.com/apache/hive/pull/575
 
 
   It is possible that the current threads classloader may be modified after 
the sessionstate hiveconf has been attached to the current thread. This ensure 
we are always adding jars to the correct class loader.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] rmsmani commented on issue #540: HIVE-21283 Synonyms for the existing functions

2019-03-20 Thread GitBox

rmsmani commented on issue #540: HIVE-21283 Synonyms for the existing functions
URL: https://github.com/apache/hive/pull/540#issuecomment-475115627
 
 
   Hi @sankarh
   At last the unit test came green. Please review and merge the code to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] rmsmani closed pull request #540: HIVE-21283 Synonyms for the existing functions

2019-03-25 Thread GitBox

rmsmani closed pull request #540: HIVE-21283 Synonyms for the existing functions
URL: https://github.com/apache/hive/pull/540
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] rmsmani commented on issue #540: HIVE-21283 Synonyms for the existing functions

2019-03-25 Thread GitBox

rmsmani commented on issue #540: HIVE-21283 Synonyms for the existing functions
URL: https://github.com/apache/hive/pull/540#issuecomment-476459266
 
 
   Source committed to master,so closing


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269425412
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   Corrected. The partition case is already fixed, but the comment wasn't 
corrected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269468871
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269476213
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   In getBaseAddPartitionDescFromPartition() where we use this function, we 
don't have access to the event message. Instead we are passing the writeId 
through ImportTableDesc by calling setReplWriteId(). This function just 
introduces the missing getReplWriteId() method symmetric to setReplWriteId(). 
If we use local variable and pass it around there is a possibility that local 
writeId variable can go out of sync with that in ImportTableDesc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269463941
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   " and also shouldn't be persisted". That's not true. We will persist the 
table stats but later. If you let me know which part of the comment is complex 
(needs simplification), will come up with alternate wording reflecting the same 
meaning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269467626
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269467699
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Done. Please check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269467769
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Done. Please check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269523732
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   itself is wrong. I do not see any ADDPROPS usage which is updating 
transactional properties. All those seem to come through AddPartition and not 
alterTable for partitioned table. So, may be we can safely mark this as 
non-transactional always. Does that look right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269452642
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Done. Looks like I missed pushing an entire commit fixing the comments. Done 
now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269508934
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   I was initially afraid that there could be other side-effects of this 
change. Your suggestion will bring all writeId replication through replWriteId, 
which is good. Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269523732
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   The comment 
   
   // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
   //   For any other updates, we don't want to do txn check on 
partitions when altering table.
   
   itself looks wrong. I do not see any ADDPROPS usage which is updating 
statistics properties. All those seem to come through AddPartition and not 
alterTable for partitioned table. So, may be we can safely mark this as 
non-transactional always. Does that look right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269424107
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   I have addressed this comment and removed it as well. But didn't commit the 
change and thus wasn't part of the PR. I have updated PR. This TODO is no more 
there. Sorry.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269423978
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   I have addressed this comment and removed it as well. But didn't commit the 
change and thus wasn't part of the PR. I have updated PR. This TODO is no more 
there. Sorry.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] miklosgergely opened a new pull request #580: HIVE-21231 HiveJoinAddNotNullRule support for range predicates

2019-03-27 Thread GitBox

miklosgergely opened a new pull request #580: HIVE-21231 HiveJoinAddNotNullRule 
support for range predicates
URL: https://github.com/apache/hive/pull/580
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269262756
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   This replWriteId is just a place holder for the writeId from the event 
message. It need not be in CreateTableDesc. Can be maintained in local 
variables and pass around.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269110947
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   Is it done or still TODO?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269247183
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean 
parallelBootstrap, boolean metad
   }
 
   @Test
-  public void testForNonAcidTables() throws Throwable {
+  public void testNonParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
 testStatsReplicationCommon(false, false);
   }
 
   @Test
-  public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable {
-testStatsReplicationCommon(true, false);
+  public void testForParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
+testStatsReplicationCommon(true, false );
   }
 
   @Test
-  public void testNonAcidMetadataOnlyDump() throws Throwable {
+  public void testMetadataOnlyDump() throws Throwable {
 
 Review comment:
   Add more tests for the following scenarios.
   1. REPL LOAD fails after replicating table or partition objects with stats 
but before setting last replId. Now, retry which takes alter table/partition 
replace flows and stats should be valid after successful replication. Need this 
for all non-transactional, transactional and migration cases.
   2. Parallel inserts with autogather enabled. Now, we will have events such 
that multiple txns open when updating stats event. Also, try to simulate that 
one stats update was successful and the other one invalidates it due to 
concurrent writes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269223302
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   addPartitionDesc.getReplicationSpec() will never be null. Can remove this 
check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269161871
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Shall use meaningful names instead of "vwil".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269172695
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269156935
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   Shall simplify the comment. "Column stats are not expected to be part of 
Create table event and also shouldn't be persisted. So remove it from Table 
object."
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269169210
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Shall add a comment on why the hardcoded validWriteList is used in this flow 
instead of taking current state of txns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269154738
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   The complete logic of considering all txns opened in a batch by open txn 
event as current txns is incorrect. 
   Multiple txns are opened by repl task only for replicating Hive Streaming 
case where we allocate txns batch but use one at a time. Also, we don't update 
stats in that case. Even if we update stats, it should refer to one txn as 
current txn and rest of the txns are left open. 
   Shall remove replTxnIds cache in TxnManager as well. All callers shall 
create a hardcoded ValidWriteIdList using the writeId received from event msg.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269081532
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which 
is not transactional. This check doesn't seems to be clean as in future we 
might add more such alters in repl flow. Can we check 
replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to 
skip this?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269103325
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Comment needs to be corrected.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269220469
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   In replication flow, it is fine to use hardcoded ValidWriteIdList as we want 
to forcefully set this writeId into table or partition objects. Getting it from 
current state might be wrong as we don't update ValidTxnList in conf for repl 
created txns. 
   ValidWriteIdList is just used to check if writeId in metastore objects are 
updated by any concurrent inserts. In repl load flow, it is not possible as we 
replicate one event at a time or in bootstrap, no 2 threads writes into same 
table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269136269
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   The comment says for partitioned table, create table task needed but in the 
code it is skipped always for commit txn event. Which one is correct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269098036
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   What do you mean by replication's valid writeid list in this comment? Even 
in repl flow, we get validWriteIdList from HMS based on incoming writeId in the 
event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use 
it instead of invoking HMS API?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-26 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269060256
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   Can we re-use the replWriteId variable that we already have?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] Fokko opened a new pull request #576: FLINK-11992 Update Apache Parquet to 1.10.1

2019-03-21 Thread GitBox

Fokko opened a new pull request #576: FLINK-11992 Update Apache Parquet to 
1.10.1
URL: https://github.com/apache/hive/pull/576
 
 
   Fixes two bugs which were discovered in Apache Spark:
   https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1101


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475619
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   The name can be repl common as its in common folder 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268474981
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   the case does not exist that a managed acid table is converted to external 
..it should be always using migration that an acid table will be converted to 
external ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268483462
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   Did you mean, a non-acid table is created as acid table in target (due to 
migration) and later in source they change to external?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat opened a new pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-25 Thread GitBox

ashutosh-bapat opened a new pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579
 
 
   During bootstrap we use a method similar to non-ACID tables to transfer 
statistics of an ACID table
   from source to replica. However installing statistics of an ACID table 
requires a valid writeId and
   writeId list. We use the table/partition's latest writeId and a valid 
transaction list containing
   only that writeId to install the statistics in the metastore.
   
   During incremental replication writeId is obtained from the UpdateStats 
event and valid writeId list
   with that writeId marked as valid is used to install the column statistics. 
Table level statistics is
   replicated by replaying corresponding ALTER_TABLE/ALTER_PARTITION event.
   
   Further this commit has following related changes.
   
   1. The table or the partition associated with the commit transaction event 
should have been created
   when replaying corresponding events before commit transaction event. Thus 
there is no need to add
   tasks for creating the table or the partition.
   
   2 Maintain a list of open replicated transactions and use that to create 
valid transactions list
   when replaying a replicated event.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268474795
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   yes ..i think the flag sent from hive server to meta store using environment 
context is not required ...the code changes are redundant ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480500
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -400,7 +405,26 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   "Unable to change partition or table. Object " +  e.getMessage() + " 
does not exist."
   + " Check metastore logs for detailed stack.");
 } finally {
-  if (!success) {
+  if (success) {
+// Txn was committed successfully.
+// If data location is changed in replication flow, then need to 
delete the old path.
+if (replDataLocationChanged) {
+  assert(olddb != null);
+  assert(oldt != null);
+  Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+  boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+  try {
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
+LOG.info("Deleted the old data location: {} for the table: {}",
+deleteOldDataLoc, dbname + "." + name);
+  } catch (MetaException ex) {
+// Eat the exception as it doesn't affect the state of existing 
tables.
+// Expect, user to manually drop this path when exception and so 
logging a warning.
+LOG.warn("Unable to delete the old data location: {} for the 
table: {}",
 
 Review comment:
   Doesn't matter, we ignore that exception here. REPL LOAD will succeed. Also, 
previous run already archived in CM dir.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh closed pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh closed pull request #578: HIVE-21471: Replicating conversion of managed 
to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480328
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   It cannot be same because the behaviour is different. If location is set for 
normal flow, we shouldn't update for partitions. Also, it is risky to modify 
any of current behaviour. It is not in scope of this ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268476396
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -400,7 +405,26 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   "Unable to change partition or table. Object " +  e.getMessage() + " 
does not exist."
   + " Check metastore logs for detailed stack.");
 } finally {
-  if (!success) {
+  if (success) {
+// Txn was committed successfully.
+// If data location is changed in replication flow, then need to 
delete the old path.
+if (replDataLocationChanged) {
+  assert(olddb != null);
+  assert(oldt != null);
+  Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+  boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+  try {
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
+LOG.info("Deleted the old data location: {} for the table: {}",
+deleteOldDataLoc, dbname + "." + name);
+  } catch (MetaException ex) {
+// Eat the exception as it doesn't affect the state of existing 
tables.
+// Expect, user to manually drop this path when exception and so 
logging a warning.
+LOG.warn("Unable to delete the old data location: {} for the 
table: {}",
 
 Review comment:
   if the delete directory succeeds and the event reply fails for some other 
reason, then the event will be replayed again. In the next replay, if it does 
not finds the directory, cm copy might fail.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480242
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   I will keep it ReplConst for now. We have ReplUtils for common methods and 
if needed we can add another class. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480382
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   Yes. it is already like that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268474981
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   the case does not exist that a managed acid table is converted to external 
..it should be always in using migration that an acid table will be converted 
to external ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475152
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   what i meant is the flow can be kept same for replication and normal flow to 
avoid adding extra complexity ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268483462
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   Did you mean, a non-acid table is created as acid table in target (due to 
migration) and later in source they change to external? I think, this case, 
need lot of changes to avoid distcp for external table and so on. I will create 
another ticket for this as there might be other cases too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475390
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   if for a partition the location is not within the table location and that 
table is altered to a external table ..the partition location need not be 
changed to base path at target ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

maheshk114 commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475390
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   if for a partition the location is not within the table location and that 
table is altered to a external table ..the partition location need not be 
changed to base path ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480113
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   It is not acid table. It is non-acid managed table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268479992
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   I think flag is needed as we shouldn't update partition locations if user 
set location for a table. That is current behaviour and shouldn't be changed. 
Only for repl flow, this is needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480500
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -400,7 +405,26 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   "Unable to change partition or table. Object " +  e.getMessage() + " 
does not exist."
   + " Check metastore logs for detailed stack.");
 } finally {
-  if (!success) {
+  if (success) {
+// Txn was committed successfully.
+// If data location is changed in replication flow, then need to 
delete the old path.
+if (replDataLocationChanged) {
+  assert(olddb != null);
+  assert(oldt != null);
+  Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+  boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+  try {
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
+LOG.info("Deleted the old data location: {} for the table: {}",
+deleteOldDataLoc, dbname + "." + name);
+  } catch (MetaException ex) {
+// Eat the exception as it doesn't affect the state of existing 
tables.
+// Expect, user to manually drop this path when exception and so 
logging a warning.
+LOG.warn("Unable to delete the old data location: {} for the 
table: {}",
 
 Review comment:
   I think, if old dir is deleted, then metadata update was already successful. 
In this case, during replay of event in next cycle would not set the flag as 
the table/partition locations were already pointing to new location under base 
dir.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381295
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   This patch is not dealing with set location at source. It deals with 
changing table type to external at source via ALTER table EXTERNAL=true in 
table properties. So, this scenario is not valid and not in scope.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381343
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -357,6 +368,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 }
   }
 
+  // If data location is changed in replication flow, then need to delete 
the old path.
+  if (replDataLocationChanged) {
+Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
 
 Review comment:
   Good catch. Will move it to finally block only when success=true and eat any 
exception by DeleteDir and just log a warn instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381321
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   ReplUtils was already taken. I used similar naming as StatsSetupConst. I 
think ReplConst is good enough for now. Will change if any utility methods 
comes up for metastore common.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381165
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableOperation.java
 ##
 @@ -54,17 +55,24 @@ public int execute() throws HiveException {
 Table tbl = desc.toTable(context.getConf());
 LOG.debug("creating table {} on {}", tbl.getFullyQualifiedName(), 
tbl.getDataLocation());
 
-if (desc.getReplicationSpec().isInReplicationScope() && 
(!desc.getReplaceMode())){
-  // if this is a replication spec, then replace-mode semantics might 
apply.
-  // if we're already asking for a table replacement, then we can skip 
this check.
-  // however, otherwise, if in replication scope, and we've not been 
explicitly asked
-  // to replace, we should check if the object we're looking at exists, 
and if so,
+boolean dataLocationChanged = false;
+if (desc.getReplicationSpec().isInReplicationScope()) {
+  // If in replication scope, we should check if the object we're looking 
at exists, and if so,
   // trigger replace-mode semantics.
   Table existingTable = context.getDb().getTable(tbl.getDbName(), 
tbl.getTableName(), false);
-  if (existingTable != null){
+  if (existingTable != null) {
 if 
(desc.getReplicationSpec().allowEventReplacementInto(existingTable.getParameters()))
 {
   desc.setReplaceMode(true); // we replace existing table.
   ReplicationSpec.copyLastReplId(existingTable.getParameters(), 
tbl.getParameters());
+
+  // If location of an existing managed table is changed, then need to 
delete the old location if exists.
+  // This scenario occurs when a managed table is converted into 
external table at source. In this case,
+  // at target, the table data would be moved to different location 
under base directory for external tables.
+  if (existingTable.getTableType().equals(TableType.MANAGED_TABLE)
+  && tbl.getTableType().equals(TableType.EXTERNAL_TABLE)
+  && 
(!existingTable.getDataLocation().equals(tbl.getDataLocation( {
 
 Review comment:
   It's not possible in replication flow. But I kept the check explicitly for 
better readability. There is no harm in keeping it. If you insist, let me 
change it to assert.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381082
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   It is not possible in normal flow to have table type changed from managed to 
external and location set in same ALTER query. Also, it won't reach here as it 
is CreateTableTask flow. Only repl flow reaches here for this scenario.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r26838
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
+
+  /**
+   * The constant that denotes the table data location is changed to different 
path. This indicates
+   * Metastore to update corresponding path in Partitions and also need to 
delete old path.
+   */
+  public static final String DATA_LOCATION_CHANGED = "DATA_LOCATION_CHANGED";
 
 Review comment:
   I thought ReplConst is self explanatory that it is Repl specific. Anyways, I 
will prefix with REPL_


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381147
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   Nope. It is not a case of migration. It is external table replication 
specific where the table is converted to external at source not at target. So, 
here location is changed only at target not in source.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-23 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381082
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   It is not possible in normal flow to have table type changed from managed to 
external and location set in same ALTER query. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269880083
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   Even that logic to create ValidWriteIdList based on all repl opened txns 
isn't right as it says that stats are valid for all these open txns but it 
isn't. Also, it sets high water mark based on 0th index in the replTxnsList map 
which might be pointing to wrong writeId compared to current txn's writeId. So, 
I doubt, this logic should be anyways removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269879748
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   Done. Instead of last.repl.id, I am explicitly checking if the property is 
related to stats and then set isTxn only in case of replication.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269880708
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   Yes, even I think, for REPL LOAD, we should always hardcode the 
ValidWriteIdList using current writeId so that stats are always valid while 
applying current event. Even if it is invalid, the subsequent 
alterTable/partition event would set it so in the table/partition parameters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269861070
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   If we unify writeId and replWriteId in CreateTableDesc into one, then it's 
fine. In fact, they are one and the same. So, no point in having 2 members for 
same value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269872622
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   If there were multiple transactions on the source running concurrently at a 
time, there will be those many open transaction events in the dump which when 
replicated will have those many open transactions at a time on the target while 
replaying those events. So, there could be multiple open transactions on target 
even during repl load.
   
   The only link between CreateTableOperation#createTableReplaceMode and 
Hive#alterTable is EnvironmentContext, so would could use this to pass a flag 
to indicate  the valid writeId list should be created using the given writeId. 
But we are using Environment context to pass information only to the metastore 
and not use it in-between. We could construct the valid writeId list in the 
metastore directly like what we are doing for create table and partition using 
that kind of flag. Does that look good?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-28 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269874353
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   Ok. Underneath this code is using the valid write id list created using open 
transaction list of repl. So, this isn't wrong. But this may change subject to 
the changes because of other comment you have.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

ashutosh-bapat commented on a change in pull request #579: HIVE-21109 : Support 
stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269856891
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   AFAIU, the reason we set replWriteId in CreateTableDesc is it can be then 
passed everywhere CreateTableDesc is used. It's better not to create two paths 
for passing same writeId, with a risk of those going of sync with each other.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269861070
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   If we unify writeId and replWriteId in CreateTableDesc, then it's fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843519
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   I meant, prepareImport already takes writeId as input parameter which is 
being set in CreateTableDesc and later read back by 
getBaseAddPartitionDescFromPartition. Instead, writeId itself can be passed to 
getBaseAddPartitionDescFromPartition.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843944
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   I think, it's fine. Ignore this comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically. Not sure if it 
is needed as table itself is not there in metastore. Anyways, please check if 
needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269843519
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   I meant, prepareImport already takes writeId (which comes from event 
message) as input parameter which is being set in CreateTableDesc and later 
read back by getBaseAddPartitionDescFromPartition. Instead, writeId itself can 
be passed to getBaseAddPartitionDescFromPartition.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #579: HIVE-21109 : Support stats replication for ACID tables.

2019-03-27 Thread GitBox

sankarh commented on a change in pull request #579: HIVE-21109 : Support stats 
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269844369
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   I don't know. I would suggest to keep it non-transactional only in repl flow 
to avoid any impacts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381242
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381258
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   Already handled. If you notice the line 279, the partition location is 
changed only if the old location is a sub-dir of old table location.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381212
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381204
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableOperation.java
 ##
 @@ -108,6 +116,12 @@ private void createTableReplaceMode(Table tbl) throws 
HiveException {
   }
 }
 
+// If table's data location is moved, then set the corresponding flag in 
environment context to
 
 Review comment:
   Nope. The scenario is location is changed at target not at source. At source 
only the table type is changed from managed to external not location. I will 
update the comment to say that in repl flow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] sankarh commented on a change in pull request #578: HIVE-21471: Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread GitBox

sankarh commented on a change in pull request #578: HIVE-21471: Replicating 
conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381269
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [hive] rmsmani commented on issue #576: HIVE-21488 Update Apache Parquet to 1.10.1

2019-03-23 Thread GitBox

rmsmani commented on issue #576: HIVE-21488 Update Apache Parquet to 1.10.1
URL: https://github.com/apache/hive/pull/576#issuecomment-475909207
 
 
   Thanks, please submit the patch file in Jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606331
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606337
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationEx.java
 ##
 @@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigrationEx - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationEx {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationWithTableMigrationEx.class);
+  private static WarehouseInstance primary, replica;
+  private String primaryDbName, replicatedDbName;
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+HashMap overrideProperties = new HashMap<>();
+internalBeforeClassSetup(overrideProperties);
+  }
+
+  static void internalBeforeClassSetup(Map overrideConfigs) 
throws Exception {
 
 Review comment:
   no


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ZhangXinJason opened a new pull request #548: HIVE-21313: Use faster method to prevent copying bytes twice

2019-02-23 Thread GitBox

ZhangXinJason opened a new pull request #548: HIVE-21313: Use faster method to 
prevent copying bytes twice
URL: https://github.com/apache/hive/pull/548
 
 
   In file 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAssignRow.java
   We may find code like this:
   ```
   Text text = (Text) convertTargetWritable;
   if (text == null)
   { text = new Text(); }
   text.set(string);
   ((BytesColumnVector) columnVector).setVal(
   batchIndex, text.getBytes(), 0, text.getLength());
   ```

   Using `setVal` method can copy the bytes array generated by 
`text.getBytes()`. This is totally unnecessary at all. Since the bytes array is 
immutable, we can just use `setRef` method to point to the specific  byte 
array, which will also lower the memory usage.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606409
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
+return compFlag == null  || compFlag.isEmpty() || 
"false".equalsIgnoreCase(compFlag);
 
 Review comment:
   i think better to have this check as for this case null and empty has 
special meaning 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox

sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606013
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +299,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 
 Review comment:
   As per discussion, we need to skip duplicate check for replace case, as we 
may end up with skipping copy of files with same name but different content. 
Also, we may need to move those matching files to new base path which is 
complicated. So, we just skip duplicate check for replace flow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By

2019-02-25 Thread GitBox

miklosgergely commented on a change in pull request #544: HIVE-16924 Support 
distinct in presence of Group By
URL: https://github.com/apache/hive/pull/544#discussion_r259953166
 
 

 ##
 File path: ql/src/test/results/clientpositive/distinct_groupby.q.out
 ##
 @@ -0,0 +1,1530 @@
+PREHOOK: query: explain select distinct count(*) from src1 where key in 
(128,146,150)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@src1
+ A masked pattern was here 
+POSTHOOK: query: explain select distinct count(*) from src1 where key in 
(128,146,150)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@src1
+ A masked pattern was here 
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Map Reduce
+  Map Operator Tree:
+  TableScan
+alias: src1
+filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: 
boolean)
+Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE 
Column stats: COMPLETE
+Filter Operator
+  predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: 
boolean)
+  Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE 
Column stats: COMPLETE
+  Select Operator
+Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE 
Column stats: COMPLETE
+Group By Operator
+  aggregations: count()
+  mode: hash
+  outputColumnNames: _col0
+  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+  Reduce Output Operator
+sort order: 
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+value expressions: _col0 (type: bigint)
+  Execution mode: vectorized
+  Reduce Operator Tree:
+Group By Operator
+  aggregations: count(VALUE._col0)
+  mode: mergepartial
+  outputColumnNames: _col0
+  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: COMPLETE
+  File Output Operator
+compressed: false
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: COMPLETE
+table:
+input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+  Stage: Stage-0
+Fetch Operator
+  limit: -1
+  Processor Tree:
+ListSink
+
+PREHOOK: query: select distinct count(*) from src1 where key in (128,146,150)
+PREHOOK: type: QUERY
+PREHOOK: Input: default@src1
+ A masked pattern was here 
+POSTHOOK: query: select distinct count(*) from src1 where key in (128,146,150)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@src1
+ A masked pattern was here 
+3
+PREHOOK: query: explain select distinct * from (select distinct count(*) from 
src1 where key in (128,146,150)) as T
+PREHOOK: type: QUERY
+PREHOOK: Input: default@src1
+ A masked pattern was here 
+POSTHOOK: query: explain select distinct * from (select distinct count(*) from 
src1 where key in (128,146,150)) as T
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@src1
+ A masked pattern was here 
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Map Reduce
+  Map Operator Tree:
+  TableScan
+alias: src1
+filterExpr: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: 
boolean)
+Statistics: Num rows: 25 Data size: 2150 Basic stats: COMPLETE 
Column stats: COMPLETE
+Filter Operator
+  predicate: (UDFToDouble(key)) IN (128.0D, 146.0D, 150.0D) (type: 
boolean)
+  Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE 
Column stats: COMPLETE
+  Select Operator
+Statistics: Num rows: 12 Data size: 1032 Basic stats: COMPLETE 
Column stats: COMPLETE
+Group By Operator
+  aggregations: count()
+  mode: hash
+  outputColumnNames: _col0
+  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+  Reduce Output Operator
+sort order: 
+Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+value expressions: _col0 (type: bigint)
+  Execution mode: vectorized
+  Reduce Operator Tree:
+Group By Operator
+  aggregations: count(VALUE._col0)
+  mode: mergepartial
+  outputColumnNames: _col0
+  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE

[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By

2019-02-25 Thread GitBox

miklosgergely commented on a change in pull request #544: HIVE-16924 Support 
distinct in presence of Group By
URL: https://github.com/apache/hive/pull/544#discussion_r259953429
 
 

 ##
 File path: ql/src/test/queries/clientpositive/distinct_groupby.q
 ##
 @@ -0,0 +1,57 @@
+--! qt:dataset:src1
+
 
 Review comment:
   adding q tests for non-cbo as well, good idea!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] miklosgergely commented on a change in pull request #544: HIVE-16924 Support distinct in presence of Group By

2019-02-25 Thread GitBox

miklosgergely commented on a change in pull request #544: HIVE-16924 Support 
distinct in presence of Group By
URL: https://github.com/apache/hive/pull/544#discussion_r259958209
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
 ##
 @@ -4230,6 +4229,34 @@ public static long unsetBit(long bitmap, int bitIdx) {
 }
   }
 
+  protected boolean isGroupBy(ASTNode expr) {
+boolean isGroupBy = false;
+if (expr.getParent() != null && expr.getParent() instanceof Node)
+for (Node sibling : ((Node)expr.getParent()).getChildren()) {
+  isGroupBy |= sibling instanceof ASTNode && ((ASTNode)sibling).getType() 
== HiveParser.TOK_GROUPBY;
+}
+
+return isGroupBy;
+  }
+
+  protected boolean isSelectDistinct(ASTNode expr) {
+return expr.getType() == HiveParser.TOK_SELECTDI;
+  }
+
+  protected boolean isAggregateInSelect(Node node, Collection 
aggregateFunction) {
+if (node.getChildren() == null) {
+  return false;
+}
+
+for (Node child : node.getChildren()) {
 
 Review comment:
   I doubt there is any. The above example is not valid, it says:
   
   Unsupported SubQuery Expression Invalid subquery. Subquery with DISTINCT 
clause is not supported!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622274
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622357
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -135,7 +135,8 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
   }
 
   private Task alterDbTask(Database dbObj) {
-return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory), context.hiveConf);
+return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory, false),
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622413
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622407
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -164,6 +165,12 @@ public IncrementalLoadTasksBuilder(String dbName, String 
tableName, String loadP
   lastEventid);
 }
   }
+
+  ReplSetFirstIncLoadFlagDesc desc = new 
ReplSetFirstIncLoadFlagDesc(dbName, tableName);
 
 Review comment:
   new ddl task is created to make it simpler with table level and warehouse 
level support


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259617900
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,21 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  if (dstFs.exists(filePath)) {
 
 Review comment:
   the i/o exception retry case is handled specifically at 2 places only. there 
are many other i/o failure scenarios which are not handled. I think its not 
required here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622991
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622969
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   its done in a separate task to make it simpler


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618842
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618838
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox

maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622190
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +299,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

1 2 3 4 5 6 7 8 >

1 - 100 of 714 matches

Mail list logo