[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982590#comment-15982590 ] Benjamin Huo commented on HDFS-7535: I've one question regarding the following comments: "This snapshot diff report represents the delta that should be applied to the backup cluster. For changes like deletion and rename we can directly apply the same operations (following some specific order based on their dependency) in the backup cluster. For changes like creation, append, and other metadata modification we keep using the functionality of the current distcp." I'm not very clear about what "we keep using the functionality of the current distcp" means. After fix HDFS-7535, the file changes list for creation and modification are generated based on snapshots s1 and s2 on the source cluster, or it's generated based on the file changes between source cluster and destination cluster? Thanks Ben > Utilize Snapshot diff report for distcp > --- > > Key: HDFS-7535 > URL: https://issues.apache.org/jira/browse/HDFS-7535 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.7.0 > > Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, > HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch > > > Currently HDFS snapshot diff report can identify file/directory creation, > deletion, rename and modification under a snapshottable directory. We can use > the diff report for distcp between the primary cluster and a backup cluster > to avoid unnecessary data copy. This is especially useful when there is a big > directory rename happening in the primary cluster: the current distcp cannot > detect the rename op thus this rename usually leads to large amounts of real > data copy. > More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348887#comment-14348887 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #123 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348962#comment-14348962 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2073 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348564#comment-14348564 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #123 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/123/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348580#comment-14348580 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-Yarn-trunk #857 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/857/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348788#comment-14348788 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2055 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2055/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348820#comment-14348820 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/114/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347349#comment-14347349 ] Hudson commented on HDFS-7535: -- FAILURE: Integrated in Hadoop-trunk-Commit #7256 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7256/]) HDFS-7535. Utilize Snapshot diff report for distcp. Contributed by Jing Zhao. (jing9: rev ed70fa142cabdbc1065e4dbbc95e99c8850c4751) * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.7.0 Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346603#comment-14346603 ] Tsz Wo Nicholas Sze commented on HDFS-7535: --- +1 the new patch looks good. Thanks for testing it and adding the new tests. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346550#comment-14346550 ] Hadoop QA commented on HDFS-7535: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702393/HDFS-7535.004.patch against trunk revision 29bb689. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9722//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9722//console This message is automatically generated. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch, HDFS-7535.004.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338629#comment-14338629 ] Tsz Wo Nicholas Sze commented on HDFS-7535: --- Thanks for working on this. The idea is great! Some comments on the patch: - In DistCpSync.moveToTmpDir, why move the paths to tmp for the delete operations? - In DistCpSync.moveToTarget, it mkdir if parent directory does not exist. Would it be able to preserve other attributes for the -p option? - In DiffInfo.getDiffs(..), no need to create the entries array. Just iterate over report.getDiffList(). - Please add javadoc for DiffInfo. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338607#comment-14338607 ] Tsz Wo Nicholas Sze commented on HDFS-7535: --- We verify these assumptions before the sync and we fallback to the default distcp behavior ... Is it better to throw an exception instead since the user may not want to fallback? Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339274#comment-14339274 ] Hadoop QA commented on HDFS-7535: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701171/HDFS-7535.003.patch against trunk revision 2214dab. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9672//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9672//console This message is automatically generated. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326466#comment-14326466 ] Hadoop QA commented on HDFS-7535: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699520/HDFS-7535.002.patch against trunk revision 1714609. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9610//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9610//console This message is automatically generated. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325295#comment-14325295 ] Hadoop QA commented on HDFS-7535: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699392/HDFS-7535.001.patch against trunk revision 685af8a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1156 javac compiler warnings (more than the trunk's current 1155 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9606//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9606//artifact/patchprocess/newPatchFindbugsWarningshadoop-distcp.html Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9606//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9606//console This message is automatically generated. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
[ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248896#comment-14248896 ] Jing Zhao commented on HDFS-7535: - A typical scenario using snapshot for distcp can be like this: every time we start distcp between the primary cluster and the backup cluster, a snapshot is first created in the primary cluster. Then the snapshot diff report is computed between the latest snapshot and the snapshot created for the last distcp. This snapshot diff report represents the delta that should be applied to the backup cluster. For changes like deletion and rename we can directly apply the same operations (following some specific order based on their dependency) in the backup cluster. For changes like creation, append, and other metadata modification we keep using the functionality of the current distcp. In this approach, we can avoid unnecessary data copy and also guarantee the source data is immutable since our snapshot is read-only. We plan to use this jira to provide the basic functionalities in the above approach. More specifically, we can first add extra options to the current distcp tool so that it can compute the dalta based on the diff report of two given snapshot names. How to manage snapshots in the source/target clusters can be done in separate jiras or through separate tools. Utilize Snapshot diff report for distcp --- Key: HDFS-7535 URL: https://issues.apache.org/jira/browse/HDFS-7535 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jing Zhao Assignee: Jing Zhao Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename and modification under a snapshottable directory. We can use the diff report for distcp between the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially useful when there is a big directory rename happening in the primary cluster: the current distcp cannot detect the rename op thus this rename usually leads to large amounts of real data copy. More details of the approach will come in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)