[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588954
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 07:06
Start Date: 26/Apr/21 07:06
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #2197:
URL: https://github.com/apache/hive/pull/2197


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588954)
Time Spent: 1h  (was: 50m)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588780
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 06:45
Start Date: 26/Apr/21 06:45
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2197:
URL: https://github.com/apache/hive/pull/2197#discussion_r618959413



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -671,10 +672,14 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 Path dbRootData = new Path(bootstrapRoot, EximUtil.DATA_PATH_NAME + 
File.separator + dbName);
 boolean dataCopyAtLoad = 
conf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET);
 ReplExternalTables externalTablesWriter = new ReplExternalTables(conf);
-Path dbPath = null;
 boolean isSingleCopyTaskForExternalTables =
-conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK)
-&& work.replScope.includeAllTables();
+conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) && 
work.replScope.includeAllTables();
+ArrayList singleCopyPaths = new ArrayList<>();
+if (db != null && isSingleCopyTaskForExternalTables) {

Review comment:
   can be added as a util

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -114,11 +112,27 @@ void dataLocationDump(Table table, FileList fileList,
 }
   }
 
-  void dbLocationDump(String dbName, Path dbLocation, FileList fileList,
-  HiveConf conf) throws Exception {
-Path fullyQualifiedDataLocation = PathBuilder
-.fullyQualifiedHDFSUri(dbLocation, FileSystem.get(hiveConf));
-dirLocationToCopy(dbName, fileList, fullyQualifiedDataLocation, conf);
+  void singleLocationsDump(List singlePathLocations, FileList 
fileList, HiveConf conf) throws Exception {

Review comment:
   nit : can rename the method to something more intuitive. or add comments

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -1680,4 +1682,104 @@ public void testDataCopyEndLog(boolean 
runCopyTasksOnTarget) throws Throwable {
 ctx.updateLoggers();
 appender.removeFromLogger(logger.getName());
   }
+
+  @Test
+  public void testSingleCopyTasksAtSource() throws Throwable {
+testDataCopyEndLog(false);
+  }
+
+  @Test
+  public void testSingleCopyTasksAtTarget() throws Throwable {
+testDataCopyEndLog(true);
+  }
+
+  public void testSingleCopyTasks(boolean runCopyTasksOnTarget)

Review comment:
   What happens if there are extra paths apart from table location in the 
parent path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588780)
Time Spent: 50m  (was: 40m)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588687
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 04:39
Start Date: 26/Apr/21 04:39
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #2197:
URL: https://github.com/apache/hive/pull/2197


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588687)
Time Spent: 40m  (was: 0.5h)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588683
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 04:35
Start Date: 26/Apr/21 04:35
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2197:
URL: https://github.com/apache/hive/pull/2197#discussion_r619966299



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -1680,4 +1682,104 @@ public void testDataCopyEndLog(boolean 
runCopyTasksOnTarget) throws Throwable {
 ctx.updateLoggers();
 appender.removeFromLogger(logger.getName());
   }
+
+  @Test
+  public void testSingleCopyTasksAtSource() throws Throwable {
+testDataCopyEndLog(false);
+  }
+
+  @Test
+  public void testSingleCopyTasksAtTarget() throws Throwable {
+testDataCopyEndLog(true);
+  }
+
+  public void testSingleCopyTasks(boolean runCopyTasksOnTarget)

Review comment:
   What happens if there are extra paths apart from table location in the 
parent path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588683)
Time Spent: 0.5h  (was: 20m)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588540
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 25/Apr/21 15:53
Start Date: 25/Apr/21 15:53
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2197:
URL: https://github.com/apache/hive/pull/2197#discussion_r618959413



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -671,10 +672,14 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 Path dbRootData = new Path(bootstrapRoot, EximUtil.DATA_PATH_NAME + 
File.separator + dbName);
 boolean dataCopyAtLoad = 
conf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET);
 ReplExternalTables externalTablesWriter = new ReplExternalTables(conf);
-Path dbPath = null;
 boolean isSingleCopyTaskForExternalTables =
-conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK)
-&& work.replScope.includeAllTables();
+conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) && 
work.replScope.includeAllTables();
+ArrayList singleCopyPaths = new ArrayList<>();
+if (db != null && isSingleCopyTaskForExternalTables) {

Review comment:
   can be added as a util

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -114,11 +112,27 @@ void dataLocationDump(Table table, FileList fileList,
 }
   }
 
-  void dbLocationDump(String dbName, Path dbLocation, FileList fileList,
-  HiveConf conf) throws Exception {
-Path fullyQualifiedDataLocation = PathBuilder
-.fullyQualifiedHDFSUri(dbLocation, FileSystem.get(hiveConf));
-dirLocationToCopy(dbName, fileList, fullyQualifiedDataLocation, conf);
+  void singleLocationsDump(List singlePathLocations, FileList 
fileList, HiveConf conf) throws Exception {

Review comment:
   nit : can rename the method to something more intuitive. or add comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588540)
Time Spent: 20m  (was: 10m)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=585802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585802
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 13:39
Start Date: 20/Apr/21 13:39
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2197:
URL: https://github.com/apache/hive/pull/2197


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585802)
Remaining Estimate: 0h
Time Spent: 10m

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)