[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication
[ https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588954 ] ASF GitHub Bot logged work on HIVE-25035: - Author: ASF GitHub Bot Created on: 26/Apr/21 07:06 Start Date: 26/Apr/21 07:06 Worklog Time Spent: 10m Work Description: aasha merged pull request #2197: URL: https://github.com/apache/hive/pull/2197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 588954) Time Spent: 1h (was: 50m) > Allow creating single copy tasks for configured paths during external table > replication > --- > > Key: HIVE-25035 > URL: https://issues.apache.org/jira/browse/HIVE-25035 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > As of now one tasks per table is created for external table replication, in > case there are multiple tables under one common directory, provide a way to > create a single task for all those table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication
[ https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588780 ] ASF GitHub Bot logged work on HIVE-25035: - Author: ASF GitHub Bot Created on: 26/Apr/21 06:45 Start Date: 26/Apr/21 06:45 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2197: URL: https://github.com/apache/hive/pull/2197#discussion_r618959413 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -671,10 +672,14 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive Path dbRootData = new Path(bootstrapRoot, EximUtil.DATA_PATH_NAME + File.separator + dbName); boolean dataCopyAtLoad = conf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET); ReplExternalTables externalTablesWriter = new ReplExternalTables(conf); -Path dbPath = null; boolean isSingleCopyTaskForExternalTables = -conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) -&& work.replScope.includeAllTables(); +conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) && work.replScope.includeAllTables(); +ArrayList singleCopyPaths = new ArrayList<>(); +if (db != null && isSingleCopyTaskForExternalTables) { Review comment: can be added as a util ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java ## @@ -114,11 +112,27 @@ void dataLocationDump(Table table, FileList fileList, } } - void dbLocationDump(String dbName, Path dbLocation, FileList fileList, - HiveConf conf) throws Exception { -Path fullyQualifiedDataLocation = PathBuilder -.fullyQualifiedHDFSUri(dbLocation, FileSystem.get(hiveConf)); -dirLocationToCopy(dbName, fileList, fullyQualifiedDataLocation, conf); + void singleLocationsDump(List singlePathLocations, FileList fileList, HiveConf conf) throws Exception { Review comment: nit : can rename the method to something more intuitive. or add comments ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -1680,4 +1682,104 @@ public void testDataCopyEndLog(boolean runCopyTasksOnTarget) throws Throwable { ctx.updateLoggers(); appender.removeFromLogger(logger.getName()); } + + @Test + public void testSingleCopyTasksAtSource() throws Throwable { +testDataCopyEndLog(false); + } + + @Test + public void testSingleCopyTasksAtTarget() throws Throwable { +testDataCopyEndLog(true); + } + + public void testSingleCopyTasks(boolean runCopyTasksOnTarget) Review comment: What happens if there are extra paths apart from table location in the parent path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 588780) Time Spent: 50m (was: 40m) > Allow creating single copy tasks for configured paths during external table > replication > --- > > Key: HIVE-25035 > URL: https://issues.apache.org/jira/browse/HIVE-25035 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > As of now one tasks per table is created for external table replication, in > case there are multiple tables under one common directory, provide a way to > create a single task for all those table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication
[ https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588687 ] ASF GitHub Bot logged work on HIVE-25035: - Author: ASF GitHub Bot Created on: 26/Apr/21 04:39 Start Date: 26/Apr/21 04:39 Worklog Time Spent: 10m Work Description: aasha merged pull request #2197: URL: https://github.com/apache/hive/pull/2197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 588687) Time Spent: 40m (was: 0.5h) > Allow creating single copy tasks for configured paths during external table > replication > --- > > Key: HIVE-25035 > URL: https://issues.apache.org/jira/browse/HIVE-25035 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > As of now one tasks per table is created for external table replication, in > case there are multiple tables under one common directory, provide a way to > create a single task for all those table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication
[ https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588683 ] ASF GitHub Bot logged work on HIVE-25035: - Author: ASF GitHub Bot Created on: 26/Apr/21 04:35 Start Date: 26/Apr/21 04:35 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2197: URL: https://github.com/apache/hive/pull/2197#discussion_r619966299 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -1680,4 +1682,104 @@ public void testDataCopyEndLog(boolean runCopyTasksOnTarget) throws Throwable { ctx.updateLoggers(); appender.removeFromLogger(logger.getName()); } + + @Test + public void testSingleCopyTasksAtSource() throws Throwable { +testDataCopyEndLog(false); + } + + @Test + public void testSingleCopyTasksAtTarget() throws Throwable { +testDataCopyEndLog(true); + } + + public void testSingleCopyTasks(boolean runCopyTasksOnTarget) Review comment: What happens if there are extra paths apart from table location in the parent path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 588683) Time Spent: 0.5h (was: 20m) > Allow creating single copy tasks for configured paths during external table > replication > --- > > Key: HIVE-25035 > URL: https://issues.apache.org/jira/browse/HIVE-25035 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As of now one tasks per table is created for external table replication, in > case there are multiple tables under one common directory, provide a way to > create a single task for all those table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication
[ https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588540 ] ASF GitHub Bot logged work on HIVE-25035: - Author: ASF GitHub Bot Created on: 25/Apr/21 15:53 Start Date: 25/Apr/21 15:53 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2197: URL: https://github.com/apache/hive/pull/2197#discussion_r618959413 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -671,10 +672,14 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive Path dbRootData = new Path(bootstrapRoot, EximUtil.DATA_PATH_NAME + File.separator + dbName); boolean dataCopyAtLoad = conf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET); ReplExternalTables externalTablesWriter = new ReplExternalTables(conf); -Path dbPath = null; boolean isSingleCopyTaskForExternalTables = -conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) -&& work.replScope.includeAllTables(); +conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) && work.replScope.includeAllTables(); +ArrayList singleCopyPaths = new ArrayList<>(); +if (db != null && isSingleCopyTaskForExternalTables) { Review comment: can be added as a util ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java ## @@ -114,11 +112,27 @@ void dataLocationDump(Table table, FileList fileList, } } - void dbLocationDump(String dbName, Path dbLocation, FileList fileList, - HiveConf conf) throws Exception { -Path fullyQualifiedDataLocation = PathBuilder -.fullyQualifiedHDFSUri(dbLocation, FileSystem.get(hiveConf)); -dirLocationToCopy(dbName, fileList, fullyQualifiedDataLocation, conf); + void singleLocationsDump(List singlePathLocations, FileList fileList, HiveConf conf) throws Exception { Review comment: nit : can rename the method to something more intuitive. or add comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 588540) Time Spent: 20m (was: 10m) > Allow creating single copy tasks for configured paths during external table > replication > --- > > Key: HIVE-25035 > URL: https://issues.apache.org/jira/browse/HIVE-25035 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > As of now one tasks per table is created for external table replication, in > case there are multiple tables under one common directory, provide a way to > create a single task for all those table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication
[ https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=585802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585802 ] ASF GitHub Bot logged work on HIVE-25035: - Author: ASF GitHub Bot Created on: 20/Apr/21 13:39 Start Date: 20/Apr/21 13:39 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request #2197: URL: https://github.com/apache/hive/pull/2197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585802) Remaining Estimate: 0h Time Spent: 10m > Allow creating single copy tasks for configured paths during external table > replication > --- > > Key: HIVE-25035 > URL: https://issues.apache.org/jira/browse/HIVE-25035 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As of now one tasks per table is created for external table replication, in > case there are multiple tables under one common directory, provide a way to > create a single task for all those table. -- This message was sent by Atlassian Jira (v8.3.4#803005)