[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-27 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16049:

   Resolution: Fixed
Fix Version/s: 2.9.3
   Status: Resolved  (was: Patch Available)

Fixed in branch-2 + branch-2.9

thanks!

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Fix For: 2.9.3
>
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch, 
> HADOOP-16049-branch-2-005.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-21 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Open  (was: Patch Available)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-21 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: HADOOP-16049-branch-2-005.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch, 
> HADOOP-16049-branch-2-005.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-21 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch, 
> HADOOP-16049-branch-2-005.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: HADOOP-16049-branch-2-003.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Open  (was: Patch Available)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-003.patch, HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: HADOOP-16049-branch-2-004.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: (was: HADOOP-16049-branch-2-002.patch)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: (was: HADOOP-16049-branch-2-001.patch)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Open  (was: Patch Available)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch, 
> HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: HADOOP-16049-branch-2-003.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-20 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Open  (was: Patch Available)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch, HADOOP-16049-branch-2-003.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Open  (was: Patch Available)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: HADOOP-16049-branch-2-002.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch, 
> HADOOP-16049-branch-2-002.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Open  (was: Patch Available)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: (was: fake-branch-2-001.patch)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: HADOOP-16049-branch-2-001.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16049-branch-2-001.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Status: Patch Available  (was: Open)

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: fake-branch-2-001.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-19 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Attachment: fake-branch-2-001.patch

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: fake-branch-2-001.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-17 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Description: 
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem and run it.

HADOOP-15292 has resolved the issue reported in this ticket in 
trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not been 
backported to branch-2 yet

 

 

  was:
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved the issue reported in this ticket in 
trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not been 
backported to branch-2 yet

 

 


> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Priority: Major
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-17 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Description: 
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved the issue reported in this ticket in 
trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not been 
backported to branch-2 yet

 

 

  was:
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved the issue reported in this ticket by not using the 
positioned read in trunk/branch-3.1/branch-3.2, but has not been backported to 
branch-2 yet

 

 


> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Priority: Major
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem.
> HADOOP-15292 has resolved the issue reported in this ticket in 
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not 
> been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-17 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Description: 
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved the issue reported in this ticket by not using the 
positioned read in trunk/branch-3.1/branch-3.2, but has not been backported to 
branch-2 yet

 

 

  was:
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved this ticket by not using the positioned read in 
trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet

 

 


> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Priority: Major
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem.
> HADOOP-15292 has resolved the issue reported in this ticket by not using the 
> positioned read in trunk/branch-3.1/branch-3.2, but has not been backported 
> to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-17 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Description: 
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved this ticket by not using the positioned read in 
trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet

 

 

  was:
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved this ticket by not using the positioned read, but has 
not been backported to branch-2 yet

 

 


> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Priority: Major
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem.
> HADOOP-15292 has resolved this ticket by not using the positioned read in 
> trunk/branch-3.1/branch-3.2, but has not been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

2019-01-17 Thread Kai Xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Xie updated HADOOP-16049:
-
Description: 
In 2.9.2 RetriableFileCopyCommand.copyBytes,
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
copy buffer size) in class TestDistCpSystem.

HADOOP-15292 has resolved this ticket by not using the positioned read, but has 
not been backported to branch-2 yet

 

 

  was:
In 2.9.2 RetriableFileCopyCommand.copyBytes,

 
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never 
updated when blocks per chunk is set to > 0 (which always disables append 
action). So for chunk with offset != 0, it will keep copying the first few 
bytes again and again, causing result to have data & checksum mismatch.

HADOOP-15292 has resolved this ticket by not using the positioned read, but has 
not been backported to branch-2 yet

 

 


> DistCp result has data and checksum mismatch when blocks per chunk > 0
> --
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2
>Reporter: Kai Xie
>Priority: Major
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never 
> updated when blocks per chunk is set to > 0 (which always disables append 
> action). So for chunk with offset != 0, it will keep copying the first few 
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default 
> copy buffer size) in class TestDistCpSystem.
> HADOOP-15292 has resolved this ticket by not using the positioned read, but 
> has not been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org