[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-16049: Resolution: Fixed Fix Version/s: 2.9.3 Status: Resolved (was: Patch Available) Fixed in branch-2 + branch-2.9 thanks! > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Fix For: 2.9.3 > > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch, > HADOOP-16049-branch-2-005.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Open (was: Patch Available) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: HADOOP-16049-branch-2-005.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch, > HADOOP-16049-branch-2-005.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch, > HADOOP-16049-branch-2-005.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: HADOOP-16049-branch-2-003.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Open (was: Patch Available) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: HADOOP-16049-branch-2-004.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: (was: HADOOP-16049-branch-2-002.patch) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: (was: HADOOP-16049-branch-2-001.patch) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Open (was: Patch Available) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-003.patch, > HADOOP-16049-branch-2-004.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: HADOOP-16049-branch-2-003.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Open (was: Patch Available) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Open (was: Patch Available) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: HADOOP-16049-branch-2-002.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch, > HADOOP-16049-branch-2-002.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Open (was: Patch Available) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: (was: fake-branch-2-001.patch) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: HADOOP-16049-branch-2-001.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: HADOOP-16049-branch-2-001.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Status: Patch Available (was: Open) > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: fake-branch-2-001.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Attachment: fake-branch-2-001.patch > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Assignee: Kai Xie >Priority: Major > Attachments: fake-branch-2-001.patch > > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Description: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem and run it. HADOOP-15292 has resolved the issue reported in this ticket in trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not been backported to branch-2 yet was: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved the issue reported in this ticket in trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not been backported to branch-2 yet > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Priority: Major > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem and run it. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Description: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved the issue reported in this ticket in trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not been backported to branch-2 yet was: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved the issue reported in this ticket by not using the positioned read in trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Priority: Major > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem. > HADOOP-15292 has resolved the issue reported in this ticket in > trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not > been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Description: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved the issue reported in this ticket by not using the positioned read in trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet was: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved this ticket by not using the positioned read in trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Priority: Major > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem. > HADOOP-15292 has resolved the issue reported in this ticket by not using the > positioned read in trunk/branch-3.1/branch-3.2, but has not been backported > to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Description: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved this ticket by not using the positioned read in trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet was: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved this ticket by not using the positioned read, but has not been backported to branch-2 yet > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Priority: Major > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem. > HADOOP-15292 has resolved this ticket by not using the positioned read in > trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Xie updated HADOOP-16049: - Description: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem. HADOOP-15292 has resolved this ticket by not using the positioned read, but has not been backported to branch-2 yet was: In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. HADOOP-15292 has resolved this ticket by not using the positioned read, but has not been backported to branch-2 yet > DistCp result has data and checksum mismatch when blocks per chunk > 0 > -- > > Key: HADOOP-16049 > URL: https://issues.apache.org/jira/browse/HADOOP-16049 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2 >Reporter: Kai Xie >Priority: Major > > In 2.9.2 RetriableFileCopyCommand.copyBytes, > {code:java} > int bytesRead = readBytes(inStream, buf, sourceOffset); > while (bytesRead >= 0) { > ... > if (action == FileAction.APPEND) { > sourceOffset += bytesRead; > } > ... // write to dst > bytesRead = readBytes(inStream, buf, sourceOffset); > }{code} > it does a positioned read but the position (`sourceOffset` here) is never > updated when blocks per chunk is set to > 0 (which always disables append > action). So for chunk with offset != 0, it will keep copying the first few > bytes again and again, causing result to have data & checksum mismatch. > To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default > copy buffer size) in class TestDistCpSystem. > HADOOP-15292 has resolved this ticket by not using the positioned read, but > has not been backported to branch-2 yet > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org