[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910141#comment-16910141 ] Kai Xie commented on HADOOP-16158: -- Thanks [~jojochuang] for reviewing the patch and merging it! I'll try to backport this patch in another Jira ticket. > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-16158.branch-3.1.patch > > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910125#comment-16910125 ] Wei-Chiu Chuang commented on HADOOP-16158: -- [~kai33] if you are interested in branch-2 and branch-2.9 (2.10 and 2.9.x releases), please submit a new patch against those branches. The conflicts are quite a lot. Thanks! > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HADOOP-16158.branch-3.1.patch > > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910122#comment-16910122 ] Hudson commented on HADOOP-16158: - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17146 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17146/]) HADOOP-16158. DistCp to support checksum validation when copy blocks in (weichiu: rev c765584eb231f8482f5b90b7e8f61f9f7a931d09) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtils.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestDistCpUtilsWithCombineMode.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyCommitter.java > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905480#comment-16905480 ] Kai Xie commented on HADOOP-16158: -- I removed patch attachments to avoid confusions. Latest patch is available in the pull request of Github > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905472#comment-16905472 ] Hadoop QA commented on HADOOP-16158: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-16158 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-16158 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965795/HADOOP-16158-005.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/16472/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16158-001.patch, HADOOP-16158-002.patch, > HADOOP-16158-003.patch, HADOOP-16158-004.patch, HADOOP-16158-005.patch > > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858323#comment-16858323 ] Kai Xie commented on HADOOP-16158: -- Move trunk's patch to github and yetus gave +1 overall: https://github.com/apache/hadoop/pull/919 > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16158-001.patch, HADOOP-16158-002.patch, > HADOOP-16158-003.patch, HADOOP-16158-004.patch, HADOOP-16158-005.patch > > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816858#comment-16816858 ] Hadoop QA commented on HADOOP-16158: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 218 unchanged - 2 fixed = 218 total (was 220) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 39s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HADOOP-16158 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965795/HADOOP-16158-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4e7dc3f1d43c 4.4.0-144-generic #170~14.04.1-Ubuntu SMP Mon Mar 18 15:02:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1943db5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/16151/testReport/ | | Max. process+thread count | 306 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/16151/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > DistCp to support checksum validation when copy blocks in parallel >
[jira] [Commented] (HADOOP-16158) DistCp to support checksum validation when copy blocks in parallel
[ https://issues.apache.org/jira/browse/HADOOP-16158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816848#comment-16816848 ] Kai Xie commented on HADOOP-16158: -- Thanks Steve for the review comments and backporting tips! I fixed those comments and submitted patch 005 for trunk. Let's see if jenkins is happy > DistCp to support checksum validation when copy blocks in parallel > -- > > Key: HADOOP-16158 > URL: https://issues.apache.org/jira/browse/HADOOP-16158 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 3.1.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16158-001.patch, HADOOP-16158-002.patch, > HADOOP-16158-003.patch, HADOOP-16158-004.patch, HADOOP-16158-005.patch > > > Copying blocks in parallel (enabled when blocks per chunk > 0) is a great > DistCp improvement that can hugely speed up copying big files. > But its checksum validation is skipped, e.g. in > `RetriableFileCopyCommand.java` > > {code:java} > if (!source.isSplit()) { > compareCheckSums(sourceFS, source.getPath(), sourceChecksum, > targetFS, targetPath); > } > {code} > and this could result in checksum/data mismatch without notifying > developers/users (e.g. HADOOP-16049). > I'd like to provide a patch to add the checksum validation. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org