Kai Xie created HADOOP-16049: -------------------------------- Summary: DistCp result has data and checksum mismatch when blocks per chunk > 0 Key: HADOOP-16049 URL: https://issues.apache.org/jira/browse/HADOOP-16049 Project: Hadoop Common Issue Type: Bug Components: tools/distcp Affects Versions: 2.9.2 Reporter: Kai Xie
In 2.9.2 RetriableFileCopyCommand.copyBytes, {code:java} int bytesRead = readBytes(inStream, buf, sourceOffset); while (bytesRead >= 0) { ... if (action == FileAction.APPEND) { sourceOffset += bytesRead; } ... // write to dst bytesRead = readBytes(inStream, buf, sourceOffset); }{code} it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch. HADOOP-15292 has resolved this ticket by not using the positioned read, but has not been backported to branch-2 yet -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org