[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391140#comment-16391140 ] Hudson commented on HADOOP-15273: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13797 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13797/]) HADOOP-15273.distcp can't handle remote stores with different checksum (stevel: rev 7ef4d942dd96232b0743a40ed25f77065254f94d) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Fix For: 3.1.0 > > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390602#comment-16390602 ] Aaron Fabbri commented on HADOOP-15273: --- LGTM. +1 Testing: I only ran the TestCopyMapper unit test > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390390#comment-16390390 ] genericqa commented on HADOOP-15273: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 90 unchanged - 4 fixed = 90 total (was 94) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 0s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913455/HADOOP-15273-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9950576a3130 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46d29e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14277/testReport/ | | Max. process+thread count | 346 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14277/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > distcp can't handle remote stores with different checksum algorithms >
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390297#comment-16390297 ] genericqa commented on HADOOP-15273: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 1 new + 90 unchanged - 4 fixed = 91 total (was 94) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 59s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913445/HADOOP-15273-002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ad3b5c0e074b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46d29e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/testReport/ | | Max. process+thread count | 339 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390291#comment-16390291 ] Steve Loughran commented on HADOOP-15273: - Patch 003 * fixes checkstyle * fixes tests With HADOOP-15297 making the etags => checksum feature in s3a optional, this isn't quite a blocker, but it is when you try to distcp between any two stores with different algorithms, because only -update lets you skip the checks right now. If any other FS offers checksums, things will break > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390179#comment-16390179 ] Steve Loughran commented on HADOOP-15273: - This is what you get now {code} Checksum mismatch between hdfs://localhost:50883/tmp/source/5/6 and hdfs://localhost:50883/tmp/target/.distcp.tmp.attempt___m_00_0. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. You can skip checksum-checks altogether with -skipcrccheck. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.) {code} > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390143#comment-16390143 ] Steve Loughran commented on HADOOP-15273: - copymapper contains test to look for string of (incorrect) -skipCrc message. So not just wrong, tests to make sure it stays wrong :) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390093#comment-16390093 ] genericqa commented on HADOOP-15273: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 1 new + 8 unchanged - 2 fixed = 9 total (was 10) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 41s{color} | {color:red} hadoop-distcp in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.mapred.TestCopyMapper | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913420/HADOOP-15273-001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 44786d213856 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d69b31f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14272/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/14272/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt | | Test Results |
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389953#comment-16389953 ] Steve Loughran commented on HADOOP-15273: - An initial distcp always does a checksum check during upload This sequence will fail {code} hadoop fs -rm -R -skipTrash s3a://hwdev-steve-new/\* hadoop distcp /user/steve/data s3a://hwdev-steve-new/data {code} Here {code} 18/03/07 17:08:03 INFO mapreduce.Job: Task Id : attempt_1520388269891_0019_m_04_2, Status : FAILED Error: java.io.IOException: File copy failed: hdfs://mycluster/user/steve/data/example.py --> s3a://hwdev-steve-new/data/example.py at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/user/steve/data/example.py to s3a://hwdev-steve-new/data/example.py at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256) ... 10 more Caused by: java.io.IOException: Check-sum mismatch between hdfs://mycluster/user/steve/data/example.py and s3a://hwdev-steve-new/data-connectors/.distcp.tmp.attempt_1520388269891_0019_m_04_2. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:223) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:133) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more {code} You cannot use -skipcrccheck as the validator forbids it, yet without it you can't upload to hdfs to s3a now that it serves up its checksums as etags. > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org