[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=401673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-401673 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 11/Mar/20 18:49 Start Date: 11/Mar/20 18:49 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 401673) Time Spent: 4h 50m (was: 4h 40m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, > HIVE-22865.11.patch, HIVE-22865.12.patch, HIVE-22865.13.patch, > HIVE-22865.14.patch, HIVE-22865.15.patch, HIVE-22865.16.patch, > HIVE-22865.17.patch, HIVE-22865.18.patch, HIVE-22865.19.patch, > HIVE-22865.2.patch, HIVE-22865.20.patch, HIVE-22865.21.patch, > HIVE-22865.22.patch, HIVE-22865.23.patch, HIVE-22865.24.patch, > HIVE-22865.25.patch, HIVE-22865.3.patch, HIVE-22865.4.patch, > HIVE-22865.5.patch, HIVE-22865.6.patch, HIVE-22865.7.patch, > HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=398588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398588 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 05/Mar/20 18:28 Start Date: 05/Mar/20 18:28 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r388480128 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -904,8 +901,20 @@ public void replicationWithTableNameContainsKeywords() throws Throwable { return ReplicationTestUtils.externalTableBasePathWithClause(REPLICA_EXTERNAL_BASE, replica); } - private void assertExternalFileInfo(List expected, Path externalTableInfoFile) + private void assertExternalFileInfo(List expected, String dumplocation) throws IOException { +assertExternalFileInfo(expected, dumplocation, null); + } + private void assertExternalFileInfo(List expected, String dumplocation, String dbName) throws IOException { +Path externalTableInfoFile = new Path(dumplocation, relativeExtInfoPath(dbName)); ReplicationTestUtils.assertExternalFileInfo(primary, expected, externalTableInfoFile); } + private String relativeExtInfoPath(String dbName) { + +if (dbName == null) { Review comment: No, the location of external table info file is different in bootstrap and incremental case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398588) Time Spent: 4.5h (was: 4h 20m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, > HIVE-22865.11.patch, HIVE-22865.2.patch, HIVE-22865.3.patch, > HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch, > HIVE-22865.7.patch, HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=398589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398589 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 05/Mar/20 18:28 Start Date: 05/Mar/20 18:28 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r388480221 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTablesBootstrap.java ## @@ -264,7 +266,7 @@ public void testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites() throw prepareIncAcidData(primaryDbName); // Perform concurrent writes. Bootstrap won't see the written data but the subsequent // incremental repl should see it. We can not inject callerVerifier since an incremental dump -// would not cause an ALTER DATABASE event. Instead we piggy back on +// would not cause an ALTER DATABASE event. Instead we piggy bEHANack on Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398589) Time Spent: 4h 40m (was: 4.5h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, > HIVE-22865.11.patch, HIVE-22865.2.patch, HIVE-22865.3.patch, > HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch, > HIVE-22865.7.patch, HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=398410&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398410 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 05/Mar/20 14:56 Start Date: 05/Mar/20 14:56 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r388345992 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -904,8 +901,20 @@ public void replicationWithTableNameContainsKeywords() throws Throwable { return ReplicationTestUtils.externalTableBasePathWithClause(REPLICA_EXTERNAL_BASE, replica); } - private void assertExternalFileInfo(List expected, Path externalTableInfoFile) + private void assertExternalFileInfo(List expected, String dumplocation) throws IOException { +assertExternalFileInfo(expected, dumplocation, null); + } + private void assertExternalFileInfo(List expected, String dumplocation, String dbName) throws IOException { +Path externalTableInfoFile = new Path(dumplocation, relativeExtInfoPath(dbName)); ReplicationTestUtils.assertExternalFileInfo(primary, expected, externalTableInfoFile); } + private String relativeExtInfoPath(String dbName) { + +if (dbName == null) { Review comment: For incremental dbname is not needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398410) Time Spent: 4h 20m (was: 4h 10m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, > HIVE-22865.11.patch, HIVE-22865.2.patch, HIVE-22865.3.patch, > HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch, > HIVE-22865.7.patch, HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=398407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398407 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 05/Mar/20 14:50 Start Date: 05/Mar/20 14:50 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r388341737 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTablesBootstrap.java ## @@ -264,7 +266,7 @@ public void testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites() throw prepareIncAcidData(primaryDbName); // Perform concurrent writes. Bootstrap won't see the written data but the subsequent // incremental repl should see it. We can not inject callerVerifier since an incremental dump -// would not cause an ALTER DATABASE event. Instead we piggy back on +// would not cause an ALTER DATABASE event. Instead we piggy bEHANack on Review comment: typo This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398407) Time Spent: 4h 10m (was: 4h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.10.patch, > HIVE-22865.11.patch, HIVE-22865.2.patch, HIVE-22865.3.patch, > HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch, > HIVE-22865.7.patch, HIVE-22865.8.patch, HIVE-22865.9.patch > > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=396068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396068 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 02/Mar/20 11:51 Start Date: 02/Mar/20 11:51 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386348679 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbstractEventHandler.java ## @@ -71,4 +85,31 @@ public long fromEventId() { public long toEventId() { return event.getEventId(); } + + public void writeFileEntry(String dbName, Table table, String file, BufferedWriter fileListWriter, Review comment: Will move out to util. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396068) Time Spent: 4h (was: 3h 50m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=396067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396067 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 02/Mar/20 11:51 Start Date: 02/Mar/20 11:51 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386348502 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbstractEventHandler.java ## @@ -71,4 +85,31 @@ public long fromEventId() { public long toEventId() { return event.getEventId(); } + + public void writeFileEntry(String dbName, Table table, String file, BufferedWriter fileListWriter, + Context withinContext) throws IOException, LoginException { +HiveConf hiveConf = withinContext.hiveConf; +String distCpDoAsUser = hiveConf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER); +if (Utils.shouldDumpMetaDataOnly(table, withinContext.hiveConf)) { Review comment: Will fix this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396067) Time Spent: 3h 50m (was: 3h 40m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=396065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396065 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 02/Mar/20 11:50 Start Date: 02/Mar/20 11:50 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386348059 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -582,10 +586,19 @@ void dumpTable(String dbName, String tblName, String validTxnList, Path dbRoot, } MmContext mmCtx = MmContext.createIfNeeded(tableSpec.tableHandle); tuple.replicationSpec.setRepl(true); -new TableExport( -exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); - +List replPathMappings = new TableExport( +exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(false); replLogger.tableLog(tblName, tableSpec.tableHandle.getTableType()); +if (Utils.shouldDumpMetaDataOnly(tuple.object, conf)) { Review comment: No, it is not done at this level. The checks have been done inside some other method calls. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396065) Time Spent: 3.5h (was: 3h 20m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=396066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396066 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 02/Mar/20 11:50 Start Date: 02/Mar/20 11:50 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386348243 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -582,10 +586,19 @@ void dumpTable(String dbName, String tblName, String validTxnList, Path dbRoot, } MmContext mmCtx = MmContext.createIfNeeded(tableSpec.tableHandle); tuple.replicationSpec.setRepl(true); -new TableExport( -exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); - +List replPathMappings = new TableExport( +exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(false); replLogger.tableLog(tblName, tableSpec.tableHandle.getTableType()); +if (Utils.shouldDumpMetaDataOnly(tuple.object, conf)) { + return; +} +for (ReplPathMapping replPathMapping: replPathMappings) { + Task copyTask = ReplCopyTask.getLoadCopyTask( + tuple.replicationSpec, replPathMapping.getSrcPath(), replPathMapping.getTargetPath(), conf, false); + this.addDependentTask(copyTask); Review comment: Will track it as part of another JIRA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396066) Time Spent: 3h 40m (was: 3.5h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395660 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 01/Mar/20 18:38 Start Date: 01/Mar/20 18:38 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386130273 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbstractEventHandler.java ## @@ -71,4 +85,31 @@ public long fromEventId() { public long toEventId() { return event.getEventId(); } + + public void writeFileEntry(String dbName, Table table, String file, BufferedWriter fileListWriter, Review comment: This can be in util class like before. Don't think its appropriate in a event handler class This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395660) Time Spent: 3h 20m (was: 3h 10m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395659&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395659 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 01/Mar/20 18:38 Start Date: 01/Mar/20 18:38 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386130273 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbstractEventHandler.java ## @@ -71,4 +85,31 @@ public long fromEventId() { public long toEventId() { return event.getEventId(); } + + public void writeFileEntry(String dbName, Table table, String file, BufferedWriter fileListWriter, Review comment: This can be in util class like before. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395659) Time Spent: 3h 10m (was: 3h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395658 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 01/Mar/20 18:37 Start Date: 01/Mar/20 18:37 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386130216 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbstractEventHandler.java ## @@ -71,4 +85,31 @@ public long fromEventId() { public long toEventId() { return event.getEventId(); } + + public void writeFileEntry(String dbName, Table table, String file, BufferedWriter fileListWriter, + Context withinContext) throws IOException, LoginException { +HiveConf hiveConf = withinContext.hiveConf; +String distCpDoAsUser = hiveConf.getVar(HiveConf.ConfVars.HIVE_DISTCP_DOAS_USER); +if (Utils.shouldDumpMetaDataOnly(table, withinContext.hiveConf)) { Review comment: If its metadata only, why are we dumping data This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395658) Time Spent: 3h (was: 2h 50m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395656 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 01/Mar/20 18:32 Start Date: 01/Mar/20 18:32 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386129803 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -582,10 +586,19 @@ void dumpTable(String dbName, String tblName, String validTxnList, Path dbRoot, } MmContext mmCtx = MmContext.createIfNeeded(tableSpec.tableHandle); tuple.replicationSpec.setRepl(true); -new TableExport( -exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); - +List replPathMappings = new TableExport( +exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(false); replLogger.tableLog(tblName, tableSpec.tableHandle.getTableType()); +if (Utils.shouldDumpMetaDataOnly(tuple.object, conf)) { Review comment: Why is this check needed here? Is it not done previously This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395656) Time Spent: 2h 40m (was: 2.5h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395657&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395657 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 01/Mar/20 18:32 Start Date: 01/Mar/20 18:32 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386129824 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -582,10 +586,19 @@ void dumpTable(String dbName, String tblName, String validTxnList, Path dbRoot, } MmContext mmCtx = MmContext.createIfNeeded(tableSpec.tableHandle); tuple.replicationSpec.setRepl(true); -new TableExport( -exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); - +List replPathMappings = new TableExport( +exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(false); replLogger.tableLog(tblName, tableSpec.tableHandle.getTableType()); +if (Utils.shouldDumpMetaDataOnly(tuple.object, conf)) { + return; +} +for (ReplPathMapping replPathMapping: replPathMappings) { + Task copyTask = ReplCopyTask.getLoadCopyTask( + tuple.replicationSpec, replPathMapping.getSrcPath(), replPathMapping.getTargetPath(), conf, false); + this.addDependentTask(copyTask); Review comment: Dynamic DAG generation needed here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395657) Time Spent: 2h 50m (was: 2h 40m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395606 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 01/Mar/20 13:56 Start Date: 01/Mar/20 13:56 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386110262 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -305,7 +309,7 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive NotificationEvent ev = evIter.next(); lastReplId = ev.getEventId(); Path evRoot = new Path(dumpRoot, String.valueOf(lastReplId)); - dumpEvent(ev, evRoot, cmRoot, hiveDb); + dumpEvent(ev, evRoot, dumpRoot, cmRoot, hiveDb); Review comment: hiveDumpRoot is recieved as dumpRoot in the current method. I haven't renamed existing parameter names. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395606) Time Spent: 2.5h (was: 2h 20m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395487 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 29/Feb/20 18:08 Start Date: 29/Feb/20 18:08 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386045024 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ExportTask.java ## @@ -53,9 +53,7 @@ public int execute() { work.acidPostProcess(db); TableExport tableExport = new TableExport(exportPaths, work.getTableSpec(), work.getReplicationSpec(), db, null, conf, work.getMmContext()); - if (!tableExport.write()) { Review comment: Have refactored the code and same behavior is achieved in TableExport.export() now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395487) Time Spent: 2h 20m (was: 2h 10m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395480&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395480 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 29/Feb/20 16:40 Start Date: 29/Feb/20 16:40 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386039609 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -305,7 +309,7 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive NotificationEvent ev = evIter.next(); lastReplId = ev.getEventId(); Path evRoot = new Path(dumpRoot, String.valueOf(lastReplId)); - dumpEvent(ev, evRoot, cmRoot, hiveDb); + dumpEvent(ev, evRoot, dumpRoot, cmRoot, hiveDb); Review comment: Why is this dumpRoot and not hiveDumpRoot This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395480) Time Spent: 2h 10m (was: 2h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=395479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395479 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 29/Feb/20 16:35 Start Date: 29/Feb/20 16:35 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r386039239 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ExportTask.java ## @@ -53,9 +53,7 @@ public int execute() { work.acidPostProcess(db); TableExport tableExport = new TableExport(exportPaths, work.getTableSpec(), work.getReplicationSpec(), db, null, conf, work.getMmContext()); - if (!tableExport.write()) { Review comment: Why is the check removed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395479) Time Spent: 2h (was: 1h 50m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch, HIVE-22865.4.patch, HIVE-22865.5.patch, HIVE-22865.6.patch > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=393259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393259 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 26/Feb/20 08:55 Start Date: 26/Feb/20 08:55 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r384348340 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -582,10 +592,20 @@ void dumpTable(String dbName, String tblName, String validTxnList, Path dbRoot, } MmContext mmCtx = MmContext.createIfNeeded(tableSpec.tableHandle); tuple.replicationSpec.setRepl(true); -new TableExport( -exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); - +Path replDataDir = new Path(dumproot, EximUtil.DATA_PATH_NAME); +new TableExport(exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); +if (conf.getBoolVar(HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY) || +(TableType.EXTERNAL_TABLE.equals(tuple.object.getTableType()) +&& (!conf.getBoolVar(HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES { + return; +} replLogger.tableLog(tblName, tableSpec.tableHandle.getTableType()); +Path dumpDataDir = new Path(dumproot, EximUtil.DATA_PATH_NAME); +Path tblCopyPath = new Path(dumpDataDir, dbName); +tblCopyPath = new Path(tblCopyPath, tblName); +Task copyTask = ReplCopyTask.getLoadCopyTask( +tuple.replicationSpec, tuple.object.getPath(), tblCopyPath, conf, false); +this.addDependentTask(copyTask); Review comment: We are not using dynamic DAG traversal for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393259) Time Spent: 1h 50m (was: 1h 40m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=393257&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393257 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 26/Feb/20 08:50 Start Date: 26/Feb/20 08:50 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r384345859 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -119,18 +121,19 @@ public String getName() { public int execute() { try { Hive hiveDb = getHive(); - Path dumpRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLDIR), getNextDumpDir()); - DumpMetaData dmd = new DumpMetaData(dumpRoot, conf); + Path dumpBaseDir = new Path(conf.getVar(HiveConf.ConfVars.REPLDIR), getNextDumpDir()); + Path hiveDumpRoot = new Path(dumpBaseDir, ReplUtils.REPL_HIVE_BASE_DIR); + DumpMetaData dmd = new DumpMetaData(hiveDumpRoot, conf); // Initialize ReplChangeManager instance since we will require it to encode file URI. ReplChangeManager.getInstance(conf); Path cmRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLCMDIR)); Long lastReplId; if (work.isBootStrapDump()) { -lastReplId = bootStrapDump(dumpRoot, dmd, cmRoot, hiveDb); +lastReplId = bootStrapDump(hiveDumpRoot, dmd, cmRoot, hiveDb); Review comment: Are we still keeping the cm? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393257) Time Spent: 1h 40m (was: 1.5h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=393256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393256 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 26/Feb/20 08:49 Start Date: 26/Feb/20 08:49 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r384345535 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosWithCopyData.java ## @@ -0,0 +1,422 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.parse; + +import org.apache.commons.io.FileUtils; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.cli.CliSessionState; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.HiveMetaStoreClient; +import org.apache.hadoop.hive.metastore.MetaStoreTestUtils; +import org.apache.hadoop.hive.metastore.PersistenceManagerProvider; +import org.apache.hadoop.hive.metastore.api.Database; +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.metastore.messaging.json.gzip.GzipJSONMessageEncoder; +import org.apache.hadoop.hive.ql.DriverFactory; +import org.apache.hadoop.hive.ql.IDriver; +import org.apache.hadoop.hive.ql.exec.Task; +import org.apache.hadoop.hive.ql.exec.repl.ReplDumpWork; +import org.apache.hadoop.hive.ql.metadata.Hive; +import org.apache.hadoop.hive.ql.processors.CommandProcessorException; +import org.apache.hadoop.hive.ql.session.SessionState; +import org.apache.hadoop.hive.shims.Utils; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TestName; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.File; +import java.io.FileWriter; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION; +import static org.junit.Assert.assertEquals;; + +public class TestReplicationScenariosWithCopyData { + + @Rule + public final TestName testName = new TestName(); + + private final static String DBNOTIF_LISTENER_CLASSNAME = + "org.apache.hive.hcatalog.listener.DbNotificationListener"; + // FIXME : replace with hive copy once that is copied + private final static String tid = + TestReplicationScenariosWithCopyData.class.getCanonicalName().toLowerCase().replace('.','_') + "_" + System.currentTimeMillis(); + private final static String TEST_PATH = + System.getProperty("test.warehouse.dir", "/tmp") + Path.SEPARATOR + tid; + + static HiveConf hconf; + static HiveMetaStoreClient metaStoreClient; + private static IDriver driver; + private static String proxySettingName; + private static HiveConf hconfMirror; + private static IDriver driverMirror; + private static HiveMetaStoreClient metaStoreClientMirror; + private static boolean isMigrationTest; + + // Make sure we skip backward-compat checking for those tests that don't generate events + + protected static final Logger LOG = LoggerFactory.getLogger(TestReplicationScenariosWithCopyData.class); + private ArrayList lastResults; + + private final boolean VERIFY_SETUP_STEPS = false; + // if verifySetup is set to true, all the test setup we do will perform additional + // verifications as well, which is useful to verify that our setup occurred + // correctly when developing and debugging tests. These verifications, however + // do not test any new functionality for replication, and thus, are not relevant + // for testing replication itself. For steady state, we want this to be false. + + @BeforeClass + public static void setUpBeforeClass() throws Exception { +Ha
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392461 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 10:30 Start Date: 25/Feb/20 10:30 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383791504 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/AbstractEventHandler.java ## @@ -71,4 +84,32 @@ public long fromEventId() { public long toEventId() { return event.getEventId(); } + + public void writeFileEntry(String dbName, String tblName, String file, BufferedWriter fileListWriter, + Context withinContext) throws IOException, LoginException { + HiveConf hiveConf = withinContext.hiveConf; + boolean copyActualData = !hiveConf.getBoolVar(HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY); Review comment: check REPL_DUMP_METADATA_EXTERNAL_TABLE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392461) Time Spent: 1h 20m (was: 1h 10m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392456 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 10:26 Start Date: 25/Feb/20 10:26 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383789102 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/TableExport.java ## @@ -160,8 +160,10 @@ private void writeData(PartitionIterable partitions) throws SemanticException { } else { List dataPathList = Utils.getDataPathList(tableSpec.tableHandle.getDataLocation(), replicationSpec, conf); -new FileOperations(dataPathList, paths.dataExportDir(), distCpDoAsUser, conf, mmCtx) -.export(replicationSpec); +// this is the data copy +if (!replicationSpec.isLazy()) { Review comment: Is the check needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392456) Time Spent: 1h 10m (was: 1h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392455&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392455 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 10:26 Start Date: 25/Feb/20 10:26 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383789019 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/PartitionExport.java ## @@ -115,8 +115,9 @@ void write(final ReplicationSpec forReplicationSpec) throws InterruptedException List dataPathList = Utils.getDataPathList(partition.getDataLocation(), forReplicationSpec, hiveConf); Path rootDataDumpDir = paths.partitionExportDir(partitionName); - new FileOperations(dataPathList, rootDataDumpDir, distCpDoAsUser, hiveConf, mmCtx) - .export(forReplicationSpec); + if (!forReplicationSpec.isLazy()) { Review comment: Is the check needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392455) Time Spent: 1h (was: 50m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392441 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 10:12 Start Date: 25/Feb/20 10:12 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383781222 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -582,10 +592,20 @@ void dumpTable(String dbName, String tblName, String validTxnList, Path dbRoot, } MmContext mmCtx = MmContext.createIfNeeded(tableSpec.tableHandle); tuple.replicationSpec.setRepl(true); -new TableExport( -exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); - +Path replDataDir = new Path(dumproot, EximUtil.DATA_PATH_NAME); +new TableExport(exportPaths, tableSpec, tuple.replicationSpec, hiveDb, distCpDoAsUser, conf, mmCtx).write(); +if (conf.getBoolVar(HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY) || Review comment: This should n't reach here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392441) Time Spent: 50m (was: 40m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392440 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 10:04 Start Date: 25/Feb/20 10:04 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383776695 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -344,8 +347,14 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive // Dump the table to be bootstrapped if required. if (shouldBootstrapDumpTable(table)) { HiveWrapper.Tuple tableTuple = new HiveWrapper(hiveDb, dbName).table(table); - dumpTable(dbName, tableName, validTxnList, dbRoot, bootDumpBeginReplId, hiveDb, + dumpTable(dbName, tableName, validTxnList, dbRoot, dumpRoot, bootDumpBeginReplId, hiveDb, tableTuple); + Path tableRoot = new Path(dbRoot, tableName); + Path dumpDataDir = new Path(dumpRoot, EximUtil.DATA_PATH_NAME); + Path tblCopyPath = new Path(dumpDataDir, dbName); + tblCopyPath = new Path(tblCopyPath, tableName); + Task copyTask = ReplCopyTask.getLoadCopyTask(new ReplicationSpec(), tableRoot, tblCopyPath, conf); Review comment: There is a copytask inside the dumpTable method as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392440) Time Spent: 40m (was: 0.5h) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch, > HIVE-22865.3.patch > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392298 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 05:53 Start Date: 25/Feb/20 05:53 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383671544 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ## @@ -278,7 +278,7 @@ a database ( directory ) } this.childTasks = scope.rootTasks; /* - Since there can be multiple rounds of this run all of which will be tied to the same + Since there can be multiple rounds rcof this run all of which will be tied to the same Review comment: typo? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392298) Time Spent: 0.5h (was: 20m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=392292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392292 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 25/Feb/20 05:44 Start Date: 25/Feb/20 05:44 Worklog Time Spent: 10m Work Description: aasha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911#discussion_r383669184 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -475,6 +475,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal REPL_DUMPDIR_TTL("hive.repl.dumpdir.ttl", "7d", new TimeValidator(TimeUnit.DAYS), "TTL of dump dirs before cleanup."), +REPL_DUMP_COPY_DATA("hive.repl.dump.copydata", true, Review comment: what happens when both REPL_DUMP_COPY_DATA and REPL_DUMP_METADATA_ONLY is set to true? which one takes precedence? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392292) Time Spent: 20m (was: 10m) > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22865.1.patch, HIVE-22865.2.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22865) Include data in replication staging directory
[ https://issues.apache.org/jira/browse/HIVE-22865?focusedWorklogId=387096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387096 ] ASF GitHub Bot logged work on HIVE-22865: - Author: ASF GitHub Bot Created on: 14/Feb/20 04:03 Start Date: 14/Feb/20 04:03 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #911: HIVE-22865 Include data in replication staging directory URL: https://github.com/apache/hive/pull/911 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387096) Remaining Estimate: 0h Time Spent: 10m > Include data in replication staging directory > - > > Key: HIVE-22865 > URL: https://issues.apache.org/jira/browse/HIVE-22865 > Project: Hive > Issue Type: Task >Reporter: PRAVIN KUMAR SINHA >Assignee: PRAVIN KUMAR SINHA >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)