[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-26 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r260172472
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -5225,7 +5226,8 @@ private int updateFirstIncPendingFlag(Hive hive, 
ReplSetFirstIncLoadFlagDesc des
   for (String dbName : Utils.matchesDb(hive, dbNameOrPattern)) {
 Database database = hive.getMSC().getDatabase(dbName);
 parameters = database.getParameters();
-if (ReplUtils.isFirstIncPending(parameters)) {
+String incPendPara = parameters != null ? 
parameters.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG) : null;
+if (incPendPara != null && (!flag.equalsIgnoreCase(incPendPara))) {
 
 Review comment:
   changed to ReplRemoveFirstIncLoadPendFlagDesc ..now just remove if the 
property is set 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r260134323
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -5216,7 +5216,8 @@ private int updateFirstIncPendingFlag(Hive hive, 
ReplSetFirstIncLoadFlagDesc des
   for (String tableName : Utils.matchesTbl(hive, dbNameOrPattern, 
tableNameOrPattern)) {
 org.apache.hadoop.hive.metastore.api.Table tbl = 
hive.getMSC().getTable(dbNameOrPattern, tableName);
 parameters = tbl.getParameters();
-if (ReplUtils.isFirstIncPending(parameters)) {
+String incPendPara = parameters != null ? 
parameters.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG) : null;
+if (incPendPara != null && (!flag.equalsIgnoreCase(incPendPara))) {
 
 Review comment:
   no ..as per current code ..we change this value only we are going to change 
it to other one ..else no need to change ..as this is called only after 
incremental is done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259820997
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
  Task preCursor) {
 HashMap mapProp = new HashMap<>();
-mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
 
-AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, mapProp, new 
ReplicationSpec(replState, replState));
+// if the update is for incLoadPendFlag, then send replicationSpec as null 
to avoid replacement check.
+ReplicationSpec replicationSpec = null;
+if (incLoadPendFlag == null) {
+  mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
+  replicationSpec = new ReplicationSpec(replState, replState);
+} else {
+  assert replState == null;
+  mapProp.put(ReplUtils.REPL_FIRST_INC_PENDING_FLAG, incLoadPendFlag);
 
 Review comment:
   done ..added check same as ckpt flag


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259804317
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -5199,6 +5205,35 @@ public static boolean doesTableNeedLocation(Table tbl) {
 return retval;
   }
 
+  private int updateFirstIncPendingFlag(Hive hive, ReplSetFirstIncLoadFlagDesc 
desc) throws HiveException, TException {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259803151
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1147,6 +1145,12 @@ private static void createReplImportTasks(
   if (!waitOnPrecursor){
 throw new 
SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName()));
   }
+  // For warehouse level replication, if the database itself is getting 
created in this load, then no need to
+  // check for duplicate copy. Check HIVE-21197 for more detail.
+  firstIncPending = false;
+} else {
+  // For database replication, get the flag from database parameter. Check 
HIVE-21197 for more detail.
+  firstIncPending = ReplUtils.isFirstIncPending(parentDb.getParameters());
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259803182
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1164,6 +1168,9 @@ private static void createReplImportTasks(
   if (x.getEventType() == DumpType.EVENT_CREATE_TABLE) {
 dropTblTask = dropTableTask(table, x, replicationSpec);
 table = null;
+  } else if (!firstIncPending) {
+// For table level replication, get the flag from table parameter. 
Check HIVE-21197 for more detail.
+firstIncPending = ReplUtils.isFirstIncPending(table.getParameters());
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259803223
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -48,13 +48,15 @@
 
   private final DatabaseEvent event;
   private final String dbNameToLoadIn;
+  private final boolean isTableLevelLoad;
 
-  public LoadDatabase(Context context, DatabaseEvent event, String 
dbNameToLoadIn,
+  public LoadDatabase(Context context, DatabaseEvent event, String 
dbNameToLoadIn, String tblNameToLoadIn,
 
 Review comment:
   yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259802388
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1164,6 +1168,9 @@ private static void createReplImportTasks(
   if (x.getEventType() == DumpType.EVENT_CREATE_TABLE) {
 dropTblTask = dropTableTask(table, x, replicationSpec);
 table = null;
+  } else if (!firstIncPending) {
+// For table level replication, get the flag from table parameter. 
Check HIVE-21197 for more detail.
+firstIncPending = ReplUtils.isFirstIncPending(table.getParameters());
 
 Review comment:
   changed the logic to avoid duplicate check in case number of base directory 
is more than one


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259802015
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -5199,6 +5205,35 @@ public static boolean doesTableNeedLocation(Table tbl) {
 return retval;
   }
 
+  private int updateFirstIncPendingFlag(Hive hive, ReplSetFirstIncLoadFlagDesc 
desc) throws HiveException, TException {
+String dbNameOrPattern = desc.getDatabaseName();
+String tableNameOrPattern = desc.getTableName();
+String flag = desc.getIncLoadPendingFlag() ? "true" : "false";
+Map parameters;
+// For database level load tableNameOrPattern will be null. Flag is set 
only in database for db level load.
+if (tableNameOrPattern != null && !tableNameOrPattern.isEmpty()) {
+  // For table level load, dbNameOrPattern is db name and not a pattern.
+  for (String tableName : Utils.matchesTbl(hive, dbNameOrPattern, 
tableNameOrPattern)) {
+org.apache.hadoop.hive.metastore.api.Table tbl = 
hive.getMSC().getTable(dbNameOrPattern, tableName);
+parameters = tbl.getParameters();
+if (ReplUtils.isFirstIncPending(parameters)) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259800518
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -274,6 +283,19 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path 
cmRoot, Hive hiveDb) th
 for (String tblName : Utils.matchesTbl(hiveDb, dbName, 
work.tableNameOrPattern)) {
   LOG.debug(
   "analyzeReplDump dumping table: " + tblName + " to db root " + 
dbRoot.toUri());
+  Table table;
+  try {
+table = hiveDb.getTable(dbName, tblName);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259800145
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259800197
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,14 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncPending(Map parameter) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259799886
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+  boolean incLoadPendingFlag;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName, 
boolean incLoadPendingFlag) {
+super();
+this.databaseName = databaseName;
+this.tableName = tableName;
+this.incLoadPendingFlag = incLoadPendingFlag;
+  }
+
+  @Explain(displayName="db_name", explainLevels = { Level.USER, Level.DEFAULT, 
Level.EXTENDED })
+  public String getDatabaseName() {
+return databaseName;
+  }
+
+  public void setDatabaseName(String databaseName) {
+this.databaseName = databaseName;
+  }
+
+  @Explain(displayName="table_name", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
+  public String getTableName() {
+return tableName;
+  }
+
+  public void setTableName(String tableName) {
+this.tableName = tableName;
+  }
+
+  @Explain(displayName="inc load pending flag", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 
 Review comment:
   no change


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259799813
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259778574
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -274,6 +283,19 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path 
cmRoot, Hive hiveDb) th
 for (String tblName : Utils.matchesTbl(hiveDb, dbName, 
work.tableNameOrPattern)) {
   LOG.debug(
   "analyzeReplDump dumping table: " + tblName + " to db root " + 
dbRoot.toUri());
+  Table table;
+  try {
+table = hiveDb.getTable(dbName, tblName);
 
 Review comment:
   "These table level checks can be done only if work.tableNameOrPattern is 
valid." with this change ..the call will be only once 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259778574
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -274,6 +283,19 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path 
cmRoot, Hive hiveDb) th
 for (String tblName : Utils.matchesTbl(hiveDb, dbName, 
work.tableNameOrPattern)) {
   LOG.debug(
   "analyzeReplDump dumping table: " + tblName + " to db root " + 
dbRoot.toUri());
+  Table table;
+  try {
+table = hiveDb.getTable(dbName, tblName);
 
 Review comment:
   "We invoke getTable twice. Can we reuse tableTuple that we get below inside 
try block." with this change ..the call will be only once 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259759960
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationEx.java
 ##
 @@ -0,0 +1,273 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigrationEx - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationEx {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationWithTableMigrationEx.class);
+  private static WarehouseInstance primary, replica;
+  private String primaryDbName, replicatedDbName;
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+HashMap overrideProperties = new HashMap<>();
+internalBeforeClassSetup(overrideProperties);
+  }
+
+  static void internalBeforeClassSetup(Map overrideConfigs) 
throws Exception {
+HiveConf conf = new HiveConf(TestReplicationWithTableMigrationEx.class);
+conf.set("dfs.client.use.datanode.hostname", "true");
+conf.set("hadoop.proxyuser." + Utils.getUGI().getShortUserName() + 
".hosts", "*");
+MiniDFSCluster miniDFSCluster =
+new MiniDFSCluster.Builder(conf).numDataNodes(1).format(true).build();
+final DistributedFileSystem fs = miniDFSCluster.getFileSystem();
+HashMap hiveConfigs = new HashMap() {{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+  put("hive.metastore.transactional.event.listeners", "");
+}};
+replica = new WarehouseInstance(LOG, miniDFSCluster, hiveConfigs);
+
+HashMap configsForPrimary = new HashMap() 
{{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+configsForPrimary.putAll(overrideConfigs);
+primary = new WarehouseInstance(LOG, miniDFSCluster, 

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259759565
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -274,6 +283,19 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path 
cmRoot, Hive hiveDb) th
 for (String tblName : Utils.matchesTbl(hiveDb, dbName, 
work.tableNameOrPattern)) {
   LOG.debug(
   "analyzeReplDump dumping table: " + tblName + " to db root " + 
dbRoot.toUri());
+  Table table;
+  try {
+table = hiveDb.getTable(dbName, tblName);
 
 Review comment:
   anyways just extra check ..so does not impact perf 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259758817
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,14 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncPending(Map parameter) {
 
 Review comment:
   parameter is generic enough ..i think keeping it is not an issue 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259758025
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259757727
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   i think it wont make much diff


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259757650
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+  boolean incLoadPendingFlag;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName, 
boolean incLoadPendingFlag) {
+super();
+this.databaseName = databaseName;
+this.tableName = tableName;
+this.incLoadPendingFlag = incLoadPendingFlag;
+  }
+
+  @Explain(displayName="db_name", explainLevels = { Level.USER, Level.DEFAULT, 
Level.EXTENDED })
+  public String getDatabaseName() {
+return databaseName;
+  }
+
+  public void setDatabaseName(String databaseName) {
+this.databaseName = databaseName;
+  }
+
+  @Explain(displayName="table_name", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
+  public String getTableName() {
+return tableName;
+  }
+
+  public void setTableName(String tableName) {
+this.tableName = tableName;
+  }
+
+  @Explain(displayName="inc load pending flag", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 
 Review comment:
   can be used for explain repl load ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259756959
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1164,6 +1168,9 @@ private static void createReplImportTasks(
   if (x.getEventType() == DumpType.EVENT_CREATE_TABLE) {
 dropTblTask = dropTableTask(table, x, replicationSpec);
 table = null;
+  } else if (!firstIncPending) {
+// For table level replication, get the flag from table parameter. 
Check HIVE-21197 for more detail.
+firstIncPending = ReplUtils.isFirstIncPending(table.getParameters());
 
 Review comment:
   table level replication ...create table event should not come


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259756728
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1164,6 +1168,9 @@ private static void createReplImportTasks(
   if (x.getEventType() == DumpType.EVENT_CREATE_TABLE) {
 dropTblTask = dropTableTask(table, x, replicationSpec);
 table = null;
+  } else if (!firstIncPending) {
+// For table level replication, get the flag from table parameter. 
Check HIVE-21197 for more detail.
+firstIncPending = ReplUtils.isFirstIncPending(table.getParameters());
 
 Review comment:
   !firstIncPending will make sure that its obtained for table level only 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259756217
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -48,13 +48,15 @@
 
   private final DatabaseEvent event;
   private final String dbNameToLoadIn;
+  private final boolean isTableLevelLoad;
 
-  public LoadDatabase(Context context, DatabaseEvent event, String 
dbNameToLoadIn,
+  public LoadDatabase(Context context, DatabaseEvent event, String 
dbNameToLoadIn, String tblNameToLoadIn,
 
 Review comment:
   its causing null pointer exception if db load is not done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-25 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259756442
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1147,6 +1145,12 @@ private static void createReplImportTasks(
   if (!waitOnPrecursor){
 throw new 
SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName()));
   }
+  // For warehouse level replication, if the database itself is getting 
created in this load, then no need to
+  // check for duplicate copy. Check HIVE-21197 for more detail.
+  firstIncPending = false;
+} else {
+  // For database replication, get the flag from database parameter. Check 
HIVE-21197 for more detail.
+  firstIncPending = ReplUtils.isFirstIncPending(parentDb.getParameters());
 
 Review comment:
   anyways it parent db it will not be set ..so automatically it will fetch 
from table 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622991
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622969
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   its done in a separate task to make it simpler


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622407
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -164,6 +165,12 @@ public IncrementalLoadTasksBuilder(String dbName, String 
tableName, String loadP
   lastEventid);
 }
   }
+
+  ReplSetFirstIncLoadFlagDesc desc = new 
ReplSetFirstIncLoadFlagDesc(dbName, tableName);
 
 Review comment:
   new ddl task is created to make it simpler with table level and warehouse 
level support


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622413
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622357
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -135,7 +135,8 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
   }
 
   private Task alterDbTask(Database dbObj) {
-return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory), context.hiveConf);
+return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory, false),
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622274
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622236
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622204
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ##
 @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() {
 Assert.assertEquals("The hash codes must be equal", 
compactionInfo.hashCode(), compactionInfo1.hashCode());
   }
 
+  @Test
+  public void testDisableCompactionDuringReplLoad() throws Exception {
+String tblName = "discomp";
+String database = "discomp_db";
+executeStatementOnDriver("drop database if exists " + database + " 
cascade", driver);
+executeStatementOnDriver("create database " + database, driver);
+executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a 
INT, b STRING) " +
+" PARTITIONED BY(ds string)" +
+" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires 
table to be bucketed
+" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + database + "." + tblName + " 
partition (ds) values (1, 'fred', " +
+"'today'), (2, 'wilma', 'yesterday')", driver);
+
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", 
driver);
+List compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'true')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
 
 Review comment:
   table level is taken care of now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622210
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ##
 @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() {
 Assert.assertEquals("The hash codes must be equal", 
compactionInfo.hashCode(), compactionInfo1.hashCode());
   }
 
+  @Test
+  public void testDisableCompactionDuringReplLoad() throws Exception {
+String tblName = "discomp";
+String database = "discomp_db";
+executeStatementOnDriver("drop database if exists " + database + " 
cascade", driver);
+executeStatementOnDriver("create database " + database, driver);
+executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a 
INT, b STRING) " +
+" PARTITIONED BY(ds string)" +
+" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires 
table to be bucketed
+" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + database + "." + tblName + " 
partition (ds) values (1, 'fred', " +
+"'today'), (2, 'wilma', 'yesterday')", driver);
+
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", 
driver);
+List compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'true')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'false')", 
driver);
+compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'false')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
 
 Review comment:
   table level is taken care of now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622190
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +299,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r25960
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
 ##
 @@ -71,6 +74,16 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
+try {
+  Database database = rs.getDatabase(getDefaultCatalog(conf), dbName);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622183
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -370,6 +370,9 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
 
   // If incremental events are already applied, then check and perform if 
need to bootstrap any tables.
   if (!builder.hasMoreWork() && !work.getPathsToCopyIterator().hasNext()) {
+// No need to set incremental load pending flag for external tables as 
the files will be copied to the same path
 
 Review comment:
   todo not required as table level load is taken care now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259621882
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
 
 Review comment:
   the code is removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259621844
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
  Task preCursor) {
 HashMap mapProp = new HashMap<>();
-mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
 
-AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, mapProp, new 
ReplicationSpec(replState, replState));
+// if the update is for incLoadPendFlag, then send replicationSpec as null 
to avoid replacement check.
+ReplicationSpec replicationSpec = null;
+if (incLoadPendFlag == null) {
+  mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
+  replicationSpec = new ReplicationSpec(replState, replState);
+} else {
+  assert replState == null;
+  mapProp.put(ReplUtils.REPL_FIRST_INC_PENDING_FLAG, incLoadPendFlag);
 
 Review comment:
   done. Dump will fail if the inc pending flag is set to true


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618838
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618842
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259617900
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,21 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  if (dstFs.exists(filePath)) {
 
 Review comment:
   the i/o exception retry case is handled specifically at 2 places only. there 
are many other i/o failure scenarios which are not handled. I think its not 
required here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606409
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
+return compFlag == null  || compFlag.isEmpty() || 
"false".equalsIgnoreCase(compFlag);
 
 Review comment:
   i think better to have this check as for this case null and empty has 
special meaning 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606337
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationEx.java
 ##
 @@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigrationEx - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationEx {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationWithTableMigrationEx.class);
+  private static WarehouseInstance primary, replica;
+  private String primaryDbName, replicatedDbName;
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+HashMap overrideProperties = new HashMap<>();
+internalBeforeClassSetup(overrideProperties);
+  }
+
+  static void internalBeforeClassSetup(Map overrideConfigs) 
throws Exception {
 
 Review comment:
   no


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-23 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606331
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310998
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259314021
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -133,8 +161,11 @@ protected int execute(DriverContext driverContext) {
 return 6;
   }
   long writeId = Long.parseLong(writeIdString);
-  toPath = new Path(toPath, 
AcidUtils.baseOrDeltaSubdir(work.getDeleteDestIfExist(), writeId, writeId,
-  
driverContext.getCtx().getHiveTxnManager().getStmtIdAndIncrement()));
+  // Set stmt id 0 for bootstrap load as the directory needs to be 
searched during incremental load to avoid any
+  // duplicate copy from the source. Check HIVE-21197 for more detail.
+  int stmtId = (writeId == 
ReplUtils.REPL_BOOTSTRAP_MIGRATION_BASE_WRITE_ID) ? 0 :
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259312676
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   in case of alter case need not set the flag ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259312460
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
 ##
 @@ -71,6 +74,20 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
+try {
+  Database database = rs.getDatabase(getDefaultCatalog(conf), dbName);
+  if (database != null) {
+return !ReplUtils.isFirstIncDone(database.getParameters());
+  }
+  LOG.info("Unable to find database " + dbName);
 
 Review comment:
   same as other method in compactor thread


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310998
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName) {
 
 Review comment:
   as its just setting it to false always ..changed the name of the method


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310718
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1147,6 +1145,12 @@ private static void createReplImportTasks(
   if (!waitOnPrecursor){
 throw new 
SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName()));
   }
+  // For warehouse level replication, if the database itself is getting 
created in this load, then no need to
+  // check for duplicate copy. Check HIVE-21197 for more detail.
+  firstIncDone = true;
 
 Review comment:
   if check was already thread and ternary will not give any perf advantage


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310336
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1136,6 +1133,7 @@ private static void createReplImportTasks(
 
 Task dropTblTask = null;
 WriteEntity.WriteType lockType = WriteEntity.WriteType.DDL_NO_LOCK;
+Boolean firstIncDone;
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310189
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +188,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   its from map and not from conf . Not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259309835
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +302,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 rcwork.setAutoPurge(isAutoPurge);
 rcwork.setNeedRecycle(needRecycle);
   }
   rcwork.setCopyToMigratedTxnTable(copyToMigratedTxnTable);
+  rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck());
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259309699
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,25 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  try {
+if (dstFs.getFileStatus(filePath) != null) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259309300
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationWithTableMigrationMisc.class);
+  private static WarehouseInstance primary, replica;
+  private String primaryDbName, replicatedDbName;
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+HashMap overrideProperties = new HashMap<>();
+internalBeforeClassSetup(overrideProperties);
+  }
+
+  static void internalBeforeClassSetup(Map overrideConfigs) 
throws Exception {
+HiveConf conf = new HiveConf(TestReplicationWithTableMigrationMisc.class);
+conf.set("dfs.client.use.datanode.hostname", "true");
+conf.set("hadoop.proxyuser." + Utils.getUGI().getShortUserName() + 
".hosts", "*");
+MiniDFSCluster miniDFSCluster =
+new MiniDFSCluster.Builder(conf).numDataNodes(1).format(true).build();
+final DistributedFileSystem fs = miniDFSCluster.getFileSystem();
+HashMap hiveConfigs = new HashMap() {{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+  put("hive.metastore.transactional.event.listeners", "");
+}};
+replica = new WarehouseInstance(LOG, miniDFSCluster, hiveConfigs);
+
+HashMap configsForPrimary = new HashMap() 
{{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+configsForPrimary.putAll(overrideConfigs);
+primary = new 

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259308757
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259308804
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259308377
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
 ##
 @@ -610,6 +610,11 @@ public void testIncrementalDumpOfWarehouse() throws 
Throwable {
 .run("show tables")
 .verifyResults(new String[] { "t1" });
 
+//default db is already created at replica before load. So the incr flag 
should not be set.
+
assertTrue(ReplUtils.isFirstIncDone(replica.getDatabase("default").getParameters()));
+
assertTrue(!ReplUtils.isFirstIncDone(replica.getDatabase(primaryDbName).getParameters()));
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259283514
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -60,6 +60,7 @@
 
   public static final String LAST_REPL_ID_KEY = "hive.repl.last.repl.id";
   public static final String REPL_CHECKPOINT_KEY = "hive.repl.ckpt.key";
+  public static final String REPL_FIRST_INC_PENDING_FLAG = 
"hive.repl.first.inc.pending";
 
 Review comment:
   add a TODO for external bootstrap during inc load


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259283120
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -60,6 +60,7 @@
 
   public static final String LAST_REPL_ID_KEY = "hive.repl.last.repl.id";
   public static final String REPL_CHECKPOINT_KEY = "hive.repl.ckpt.key";
+  public static final String REPL_FIRST_INC_PENDING_FLAG = 
"hive.repl.first.inc.pending";
 
 Review comment:
   add a comment to explain


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259281934
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
 ##
 @@ -53,6 +56,20 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
 
 Review comment:
   all methods are duplicated so kept it same 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259280119
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -133,8 +161,11 @@ protected int execute(DriverContext driverContext) {
 return 6;
   }
   long writeId = Long.parseLong(writeIdString);
-  toPath = new Path(toPath, 
AcidUtils.baseOrDeltaSubdir(work.getDeleteDestIfExist(), writeId, writeId,
-  
driverContext.getCtx().getHiveTxnManager().getStmtIdAndIncrement()));
+  // Set stmt id 0 for bootstrap load as the directory needs to be 
searched during incremental load to avoid any
+  // duplicate copy from the source. Check HIVE-21197 for more detail.
+  int stmtId = (writeId == 
ReplUtils.REPL_BOOTSTRAP_MIGRATION_BASE_WRITE_ID) ? 0 :
 
 Review comment:
   dont use magic number 0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259278076
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   in repl inc load task ..get the database list and alter the property 
..remove table level property


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259274648
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
 
 Review comment:
   change it to Ex


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services