[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262747
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 19/Jun/19 03:18
Start Date: 19/Jun/19 03:18
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262747)
Time Spent: 2.5h  (was: 2h 20m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch, 
> HIVE-21763.03.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262179
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 09:48
Start Date: 18/Jun/19 09:48
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294704914
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -364,6 +367,35 @@ private void cleanTablesFromBootstrap() throws 
HiveException, IOException, Inval
 }
   }
 
+  /**
+   * If replication policy is changed between previous and current load, then 
the excluded tables in
+   * the new replication policy will be dropped.
+   * @throws HiveException Failed to get/drop the tables.
+   */
+  private void dropTablesExcludedInReplScope(ReplScope replScope) throws 
HiveException {
+// If all tables are included in replication scope, then nothing to be 
dropped.
+if ((replScope == null) || replScope.includeAllTables()) {
+  return;
+}
+
+Hive db = getHive();
+String dbName = replScope.getDbName();
+
+// List all the tables that are excluded in the current repl scope.
+Iterable tableNames = Collections2.filter(db.getAllTables(dbName),
+tableName -> {
+  assert(tableName != null);
+  return !tableName.toLowerCase().startsWith(
 
 Review comment:
   I got your point... But this is Iterable. We list from getAllTables and then 
the below loop iterate and just drop when found a match. So, it is ideally the 
same as what you expect. Also, not sure if we can handle exceptions in filter 
method. Anyways, this is same as we don't have any double loops.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262179)
Time Spent: 2h 20m  (was: 2h 10m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch, 
> HIVE-21763.03.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is 

[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262037
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:44
Start Date: 18/Jun/19 04:44
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294607685
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -259,8 +263,10 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 Table table = hiveDb.getTable(dbName, tableName);
 
 // Dump external table locations if required.
-if (shouldDumpExternalTableLocation() &&
-TableType.EXTERNAL_TABLE.equals(table.getTableType())) {
+// Note: If repl policy is replaced, then need to dump external 
tables if table is getting replicated
+// for the first time in current dump. So, need to check if table 
is included in old policy.
+if ((shouldDumpExternalTableLocation() || 
!ReplUtils.tableIncludedInReplScope(work.oldReplScope, tableName))
 
 Review comment:
   Good catch. Will fix it and add a test for this combination.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262037)
Time Spent: 2h 10m  (was: 2h)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262036
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:44
Start Date: 18/Jun/19 04:44
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294607671
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -282,6 +288,12 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 return lastReplId;
   }
 
+  private boolean needBootstrapAcidTablesDuringIncrementalDump() {
+// If old replication policy is available, then it is possible some of the 
ACID tables might be
+// included for bootstrap during incremental dump.
+return (work.oldReplScope != null) || 
conf.getBoolVar(HiveConf.ConfVars.REPL_BOOTSTRAP_ACID_TABLES);
 
 Review comment:
   Good catch. Will fix it and add a test for this combination.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262036)
Time Spent: 2h  (was: 1h 50m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262035
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:43
Start Date: 18/Jun/19 04:43
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294607425
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
 ##
 @@ -61,12 +63,19 @@
   */
   final LineageState sessionStateLineageState;
 
-  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory, String 
dbNameToLoadIn,
-  LineageState lineageState, boolean isIncrementalDump, Long eventTo,
-  List pathsToCopyIterator) throws IOException {
+  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
+  String dbNameToLoadIn, ReplScope changedReplScope,
 
 Review comment:
   It is in fact changed/new repl policy. To be clear, I will rename it to 
currentReplPolicy. Hope that is fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262035)
Time Spent: 1h 50m  (was: 1h 40m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262034&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262034
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:42
Start Date: 18/Jun/19 04:42
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294607336
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -364,6 +367,35 @@ private void cleanTablesFromBootstrap() throws 
HiveException, IOException, Inval
 }
   }
 
+  /**
+   * If replication policy is changed between previous and current load, then 
the excluded tables in
+   * the new replication policy will be dropped.
+   * @throws HiveException Failed to get/drop the tables.
+   */
+  private void dropTablesExcludedInReplScope(ReplScope replScope) throws 
HiveException {
+// If all tables are included in replication scope, then nothing to be 
dropped.
+if ((replScope == null) || replScope.includeAllTables()) {
+  return;
+}
+
+Hive db = getHive();
+String dbName = replScope.getDbName();
+
+// List all the tables that are excluded in the current repl scope.
+Iterable tableNames = Collections2.filter(db.getAllTables(dbName),
+tableName -> {
+  assert(tableName != null);
+  return !tableName.toLowerCase().startsWith(
 
 Review comment:
   We need to get all tables right? Otherwise, how do we know which table to 
drop?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262034)
Time Spent: 1h 40m  (was: 1.5h)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262033&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262033
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:41
Start Date: 18/Jun/19 04:41
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294607244
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
 ##
 @@ -151,21 +154,49 @@ private void setReplDumpTablesList(Tree replTablesNode) 
throws HiveException {
   }
 
   if (listIdx == 0) {
-LOG.info("ReplScope: Set Included Tables List: {}", tablesList);
+LOG.info("{} ReplScope: Set Included Tables List: {}", replScopeType, 
tablesList);
 replScope.setIncludedTablePatterns(tablesList);
   } else {
-LOG.info("ReplScope: Set Excluded Tables List: {}", tablesList);
+LOG.info("{} ReplScope: Set Excluded Tables List: {}", replScopeType, 
tablesList);
 replScope.setExcludedTablePatterns(tablesList);
   }
 }
   }
 
+  private void setOldReplPolicy(Tree oldReplPolicyTree) throws HiveException {
+oldReplScope = new ReplScope();
+int childCount = oldReplPolicyTree.getChildCount();
+
+// First child is DB name and optional second child is tables list.
+assert(childCount <= 2);
+
+// First child is always the DB name. So set it.
+oldReplScope.setDbName(oldReplPolicyTree.getChild(0).getText());
+LOG.info("Old ReplScope: Set DB Name: {}", oldReplScope.getDbName());
+if (!oldReplScope.getDbName().equalsIgnoreCase(replScope.getDbName())) {
+  LOG.error("DB name {} cannot be replaced to {} in the replication 
policy.",
 
 Review comment:
   As discussed yesterday, we can keep dbName for readability. Also, it will be 
helpful to keep it complete repl policy if in future we want to support it for 
some use case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262033)
Time Spent: 1.5h  (was: 1h 20m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the cu

[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262031
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:39
Start Date: 18/Jun/19 04:39
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294606978
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
 ##
 @@ -894,12 +894,20 @@ replDumpStatement
 @after { popMsg(state); }
   : KW_REPL KW_DUMP
 (dbName=identifier) (DOT tablePolicy=replTableLevelPolicy)?
+(KW_REPLACE replacePolicy=replReplacePolicy)?
 (KW_FROM (eventId=Number)
   (KW_TO (rangeEnd=Number))?
   (KW_LIMIT (batchSize=Number))?
 )?
 (KW_WITH replConf=replConfigs)?
--> ^(TOK_REPL_DUMP $dbName $tablePolicy? ^(TOK_FROM $eventId (TOK_TO 
$rangeEnd)? (TOK_LIMIT $batchSize)?)? $replConf?)
+-> ^(TOK_REPL_DUMP $dbName $tablePolicy? $replacePolicy? ^(TOK_FROM 
$eventId (TOK_TO $rangeEnd)? (TOK_LIMIT $batchSize)?)? $replConf?)
+;
+
+replReplacePolicy
 
 Review comment:
   It is same. Only difference is replReplacePolicy takes token TOK_REPLACE 
additionally. I think, this can be changed to have common code. Will make this 
change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262031)
Time Spent: 1h 20m  (was: 1h 10m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262030
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 04:38
Start Date: 18/Jun/19 04:38
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294606868
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
 ##
 @@ -61,12 +63,19 @@
   */
   final LineageState sessionStateLineageState;
 
-  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory, String 
dbNameToLoadIn,
-  LineageState lineageState, boolean isIncrementalDump, Long eventTo,
-  List pathsToCopyIterator) throws IOException {
+  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
+  String dbNameToLoadIn, ReplScope changedReplScope,
+  LineageState lineageState, boolean isIncrementalDump, 
Long eventTo,
+  List pathsToCopyIterator) throws 
IOException {
 sessionStateLineageState = lineageState;
 this.dumpDirectory = dumpDirectory;
 this.dbNameToLoadIn = dbNameToLoadIn;
+this.changedReplScope = changedReplScope;
+
+// If DB name is changed during REPL LOAD, then set it instead of 
referring to source DB name.
+if ((changedReplScope != null) && StringUtils.isNotBlank(dbNameToLoadIn)) {
 
 Review comment:
   Yes, dump metadata will always have DB name at source. If user explicitly 
mentioned db name in REPL LOAD, then dbNameToLoadIn will not be blank. Also, we 
don't pass real policy in dump metadata always. Only if the policy changes, 
then only passed. This is an optimization to avoid traversing all tables. So, 
changedReplScope can be null and also dbNameToLoadIn can be blank or not blank.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 262030)
Time Spent: 1h 10m  (was: 1h)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_

[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261985
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294140450
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
 ##
 @@ -894,12 +894,20 @@ replDumpStatement
 @after { popMsg(state); }
   : KW_REPL KW_DUMP
 (dbName=identifier) (DOT tablePolicy=replTableLevelPolicy)?
+(KW_REPLACE replacePolicy=replReplacePolicy)?
 (KW_FROM (eventId=Number)
   (KW_TO (rangeEnd=Number))?
   (KW_LIMIT (batchSize=Number))?
 )?
 (KW_WITH replConf=replConfigs)?
--> ^(TOK_REPL_DUMP $dbName $tablePolicy? ^(TOK_FROM $eventId (TOK_TO 
$rangeEnd)? (TOK_LIMIT $batchSize)?)? $replConf?)
+-> ^(TOK_REPL_DUMP $dbName $tablePolicy? $replacePolicy? ^(TOK_FROM 
$eventId (TOK_TO $rangeEnd)? (TOK_LIMIT $batchSize)?)? $replConf?)
+;
+
+replReplacePolicy
 
 Review comment:
   replace policy format should be same as normal policy. Why new format 
specifier is added for replace policy ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261985)
Time Spent: 20m  (was: 10m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261991
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294589370
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -364,6 +367,35 @@ private void cleanTablesFromBootstrap() throws 
HiveException, IOException, Inval
 }
   }
 
+  /**
+   * If replication policy is changed between previous and current load, then 
the excluded tables in
+   * the new replication policy will be dropped.
+   * @throws HiveException Failed to get/drop the tables.
+   */
+  private void dropTablesExcludedInReplScope(ReplScope replScope) throws 
HiveException {
+// If all tables are included in replication scope, then nothing to be 
dropped.
+if ((replScope == null) || replScope.includeAllTables()) {
+  return;
+}
+
+Hive db = getHive();
+String dbName = replScope.getDbName();
+
+// List all the tables that are excluded in the current repl scope.
+Iterable tableNames = Collections2.filter(db.getAllTables(dbName),
+tableName -> {
+  assert(tableName != null);
+  return !tableName.toLowerCase().startsWith(
 
 Review comment:
   why prepare the list ..can we not just drop the table ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261991)
Time Spent: 1h  (was: 50m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261988
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294140658
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
 ##
 @@ -151,21 +154,49 @@ private void setReplDumpTablesList(Tree replTablesNode) 
throws HiveException {
   }
 
   if (listIdx == 0) {
-LOG.info("ReplScope: Set Included Tables List: {}", tablesList);
+LOG.info("{} ReplScope: Set Included Tables List: {}", replScopeType, 
tablesList);
 replScope.setIncludedTablePatterns(tablesList);
   } else {
-LOG.info("ReplScope: Set Excluded Tables List: {}", tablesList);
+LOG.info("{} ReplScope: Set Excluded Tables List: {}", replScopeType, 
tablesList);
 replScope.setExcludedTablePatterns(tablesList);
   }
 }
   }
 
+  private void setOldReplPolicy(Tree oldReplPolicyTree) throws HiveException {
+oldReplScope = new ReplScope();
+int childCount = oldReplPolicyTree.getChildCount();
+
+// First child is DB name and optional second child is tables list.
+assert(childCount <= 2);
+
+// First child is always the DB name. So set it.
+oldReplScope.setDbName(oldReplPolicyTree.getChild(0).getText());
+LOG.info("Old ReplScope: Set DB Name: {}", oldReplScope.getDbName());
+if (!oldReplScope.getDbName().equalsIgnoreCase(replScope.getDbName())) {
+  LOG.error("DB name {} cannot be replaced to {} in the replication 
policy.",
 
 Review comment:
   i think we should not allow db name in replace policy 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261988)
Time Spent: 40m  (was: 0.5h)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" d

[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261989&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261989
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294588902
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -259,8 +263,10 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 Table table = hiveDb.getTable(dbName, tableName);
 
 // Dump external table locations if required.
-if (shouldDumpExternalTableLocation() &&
-TableType.EXTERNAL_TABLE.equals(table.getTableType())) {
+// Note: If repl policy is replaced, then need to dump external 
tables if table is getting replicated
+// for the first time in current dump. So, need to check if table 
is included in old policy.
+if ((shouldDumpExternalTableLocation() || 
!ReplUtils.tableIncludedInReplScope(work.oldReplScope, tableName))
 
 Review comment:
   it should dump external table only if external table dump is enabled 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261989)
Time Spent: 40m  (was: 0.5h)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261986
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294589151
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##
 @@ -282,6 +288,12 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 return lastReplId;
   }
 
+  private boolean needBootstrapAcidTablesDuringIncrementalDump() {
+// If old replication policy is available, then it is possible some of the 
ACID tables might be
+// included for bootstrap during incremental dump.
+return (work.oldReplScope != null) || 
conf.getBoolVar(HiveConf.ConfVars.REPL_BOOTSTRAP_ACID_TABLES);
 
 Review comment:
   and only if acid table dump is enabled ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261986)
Time Spent: 0.5h  (was: 20m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261987
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294140129
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
 ##
 @@ -61,12 +63,19 @@
   */
   final LineageState sessionStateLineageState;
 
-  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory, String 
dbNameToLoadIn,
-  LineageState lineageState, boolean isIncrementalDump, Long eventTo,
-  List pathsToCopyIterator) throws IOException {
+  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
+  String dbNameToLoadIn, ReplScope changedReplScope,
+  LineageState lineageState, boolean isIncrementalDump, 
Long eventTo,
+  List pathsToCopyIterator) throws 
IOException {
 sessionStateLineageState = lineageState;
 this.dumpDirectory = dumpDirectory;
 this.dbNameToLoadIn = dbNameToLoadIn;
+this.changedReplScope = changedReplScope;
+
+// If DB name is changed during REPL LOAD, then set it instead of 
referring to source DB name.
+if ((changedReplScope != null) && StringUtils.isNotBlank(dbNameToLoadIn)) {
 
 Review comment:
   Is this scenario possible ? as the chnagedReplScope is obtained from dump 
metadata.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261987)
Time Spent: 40m  (was: 0.5h)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=261990&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261990
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 18/Jun/19 02:39
Start Date: 18/Jun/19 02:39
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294589774
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
 ##
 @@ -61,12 +63,19 @@
   */
   final LineageState sessionStateLineageState;
 
-  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory, String 
dbNameToLoadIn,
-  LineageState lineageState, boolean isIncrementalDump, Long eventTo,
-  List pathsToCopyIterator) throws IOException {
+  public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
+  String dbNameToLoadIn, ReplScope changedReplScope,
 
 Review comment:
   the input is not changed repl scope ..its the original repl scope ..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 261990)
Time Spent: 50m  (was: 40m)

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.

2019-06-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=260927&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-260927
 ]

ASF GitHub Bot logged work on HIVE-21763:
-

Author: ASF GitHub Bot
Created on: 15/Jun/19 18:21
Start Date: 15/Jun/19 18:21
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #673: HIVE-21763: 
Incremental replication to allow changing include/exclude tables list in 
replication policy.
URL: https://github.com/apache/hive/pull/673
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 260927)
Time Spent: 10m
Remaining Estimate: 0h

> Incremental replication to allow changing include/exclude tables list in 
> replication policy.
> 
>
> Key: HIVE-21763
> URL: https://issues.apache.org/jira/browse/HIVE-21763
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP  [REPLACE  FROM 
>  WITH ;
> - current_repl_policy and previous_repl_policy can be any format mentioned in 
> Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If 
> REPLACE clause is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on 
> current_repl_policy.
> - Single table replication of format .t1 doesn’t allow changing the 
> policy dynamically. So REPLACE clause is not allowed if previous_repl_policy 
> of this format.
> - If any table is added dynamically either due to change in regular 
> expression or added to include list should be bootstrapped using independant 
> table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the 
> list by comparing the current_repl_policy & previous_repl_policy inputs and 
> combine bootstrap dump for added tables as part of incremental dump. 
> "_bootstrap" directory can be created in dump dir to accommodate all tables 
> to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for 
> replication based on defined replication policy + include/exclude list. So, 
> Hive will perform bootstrap for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views 
> excluded in the new policy  compared to previous policy. It should be done 
> before performing incremental and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then 
> check for "_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)