[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115145#comment-14115145 ] Hudson commented on HDFS-6865: -- FAILURE: Integrated in Hadoop-Yarn-trunk #663 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/663/]) HDFS-6865. Byte array native checksumming on client side. Contributed by James Thomas. (todd: rev ab638e77b811d9592470f7d342cd11a66efbbf0d) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCrc32.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockUnderConstruction.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFs.java Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115258#comment-14115258 ] Hudson commented on HDFS-6865: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1854 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/]) HDFS-6865. Byte array native checksumming on client side. Contributed by James Thomas. (todd: rev ab638e77b811d9592470f7d342cd11a66efbbf0d) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockUnderConstruction.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFs.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCrc32.java Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115430#comment-14115430 ] Hudson commented on HDFS-6865: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1880 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1880/]) HDFS-6865. Byte array native checksumming on client side. Contributed by James Thomas. (todd: rev ab638e77b811d9592470f7d342cd11a66efbbf0d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockUnderConstruction.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCrc32.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFs.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Fix For: 2.6.0 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114221#comment-14114221 ] stack commented on HDFS-6865: - bq, Stack, by any chance did you run any HBase correctness or system tests with the patch enabled? Ran a short test.IntegrationTestBigLinkedList on a small cluster. {code} 2014-08-28 12:18:30,126 INFO [main] test.IntegrationTestBigLinkedList$Loop: Verify finished with succees. Total nodes=200 {code} I'd imagine it would fail if we were misreading. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114602#comment-14114602 ] Todd Lipcon commented on HDFS-6865: --- Sweet. You the man, Stack! I'll commit this momentarily Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112100#comment-14112100 ] Hadoop QA commented on HDFS-6865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664578/HDFS-6865.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.security.token.block.TestBlockToken {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7780//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7780//console This message is automatically generated. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112511#comment-14112511 ] Hadoop QA commented on HDFS-6865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664676/HDFS-6865.7.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7785//console This message is automatically generated. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112573#comment-14112573 ] stack commented on HDFS-6865: - I tried this patch on a small cluster doing TestHLogPE. All seems to work same as w/o the patch; no obvious regression. This test is not good for seeing benefit of this patch being a single file write contended over by many threads appending and trying to sync as fast as they can. That said, comparing perf stats of tests where there was no native available to tip of branch-2 as of last night and then to branch-2+this patch, there is no discernible gain/loss with this patch in place. Indirectly related, as a user, how would I know this improvement is in effect? Its 'on' all the time but what say, if native is not available, how as a user do I get a clue I'm missing out on nice checksum speedup? For myself, I hacked in logging into DFSClient so could confirm this patch was in effect. I could make a new issue to add this formally. Might help especially in the client case. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112595#comment-14112595 ] James Thomas commented on HDFS-6865: [~stack], thanks for taking a look. So you saw no effect when native was not available, right? And a new issue for the logging would be great. We can throw the log message into a constructor or something so it's only printed once. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112599#comment-14112599 ] stack commented on HDFS-6865: - bq. So you saw no effect when native was not available, right? Not in my case. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112801#comment-14112801 ] Hadoop QA commented on HDFS-6865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664682/HDFS-6865.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7786//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7786//console This message is automatically generated. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112857#comment-14112857 ] Todd Lipcon commented on HDFS-6865: --- For the logging, this sounds like a good thing to put in HDFS-4486's PerformanceAdvisory log category. Let me take a look at this latest patch. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112879#comment-14112879 ] Todd Lipcon commented on HDFS-6865: --- +1, looks good to me. Stack, by any chance did you run any HBase correctness or system tests with the patch enabled? If that's easy to do, would be great to verify with another heavy user of HDFS that this doesn't break anything. I'll commit this to trunk and branch-2 tomorrow if I don't hear anything to the contrary. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111085#comment-14111085 ] Todd Lipcon commented on HDFS-6865: --- Thanks for doing the diligence on the performance tests. Looks like this will be a good speedup across the board. A few comments: - In the FSOutputSummer constructor, aren't checksumSize and maxChunkSize now redundant with the DataChecksum object that's passed in? {{checksumSize}} should be the same as {{sum.getChecksumSize()}} and {{maxChunkSize}} should be the same as {{sum.getBytesPerChecksum()}}, no? - Similarly, in the FSOutputSummer class, it seems like the member variables of the same names are redundantr with the {{sum}} member variable. - Can you mark {{sum}} as {{final}} in FSOutputSummer? - Shouldn't BUFFER_NUM_CHUNKS be a multiple of 3, since we calculate three chunks worth in parallel in the native code? (worth a comment explaining the choice, too) {code} private int write1(byte b[], int off, int len) throws IOException { if(count==0 len=buf.length) { // local buffer is empty and user data has one chunk // checksum and output data {code} This comment is no longer accurate, right? The condition is now that the user data has provided data at least as long as our internal buffer. - {{writeChecksumChunk}} should probably be renamed to {{writeChecksumChunks}} and its javadoc get updated. - It's a little weird that you loop over {{writeChunk}} and pass a single chunk per call, though you actually have data ready for multiple chunks, and the API itself seems to be perfectly suitable to pass all of the chunks at once. Did you want to leave this as a later potential optimization? {code} writeChunk(b, off + i, Math.min(maxChunkSize, len - i), checksum, i / maxChunkSize * checksumSize, checksumSize); {code} This code might be a little easier to read if you made some local variables: {code} int rem = Math.min(maxChunkSize, len - i); int ckOffset = i / maxChunkSize * checksumSize; writeChunk(b, off + i, rem, checksum, ckOffset, checksumSize); {code} {code} /* Forces any buffered output bytes to be checksumed and written out to * the underlying output stream. If keep is true, then the state of * this object remains intact. {code} This comment is now inaccurate. If {{keep}} is true, then it retains only the last partial chunk worth of buffered data. - The {{setNumChunksToBuffer}} static thing is kind of sketchy. What if, instead, you implemented flush() in FSOutputSummer such that it always flushed all completed chunks? (and not any partial last chunk). Then you could make those tests call flush() before checkFile(), and not have to break any abstractions? Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107404#comment-14107404 ] James Thomas commented on HDFS-6865: I ran the tests that [~tlipcon] suggested and have some results. I created buffers of various sizes and repeatedly wrote them using FSDataOutputStream.write(). For each buffer size, I also wrapped FSDataOutputStream with a BufferedOutputStream. I made sure the packet size and block sizes were large enough that no actual writes to DataNodes occurred, so the times shown here primarily cover data buffering and checksumming and packet construction on the client side. The following times are all in milliseconds. Each test involved writing 8 MB of data to the stream. I only did one run for each of these data points, so there are a few unreproducible outliers (e.g. the 130ms in the 2^8 row), but the results are generally good enough that I didn't think averaging over a large number of runs was necessary. Some interpretation of the results: Naturally the time goes down with bigger buffers since we have fewer instructions (less method call overhead) per byte. At smaller buffer sizes the time for the checksum becomes more and more negligible compared to the other overheads per byte (after all, the checksum is a handful of instructions per byte even for the Java code), so we don't see much of a difference between the pre- and post-change code. The main case I was worried about was for input buffers (in the non-BufferedOuputStream case) larger than the original FSOutputSummer buffer (512 bytes) and smaller than the current FSOutputSummer buffer (5120 bytes), because these incur a buffer copy in the new FSOutputSummer (since there is now space for them in the FSOutputSummer's buffer) but were sent directly to the DFSOutputStream (to be copied into a packet) in the old FSOutputStream. But the data shows that this case (rows 2^9 and 2^10) is not problematic -- clearly the extra buffer copies are offset by the time saved by faster checksumming. ||log(Buffer Size)||pre-change||pre-change w/ BufferedStream||post-change||post-change w/ BufferedStream| |0|463|258|449|261| |1|249|125|213|118| |2|133|61|112|62| |3|42|16|56|22| |4|32|21|22|8| |5|15|14|18|8| |6|19|9|7|6| |7|18|28|11|5| |8|14|15|5|130| |9|12|12|4|4| |10|15|8|5|4| Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107661#comment-14107661 ] Hadoop QA commented on HDFS-6865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663708/HDFS-6865.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.web.TestWebHDFSAcl The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.security.token.block.TestBlockToken {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7723//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7723//console This message is automatically generated. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.5.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106344#comment-14106344 ] Hadoop QA commented on HDFS-6865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663419/HDFS-6865.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 3 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots org.apache.hadoop.hdfs.server.namenode.TestBlockUnderConstruction {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7707//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7707//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7707//console This message is automatically generated. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)
[ https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102146#comment-14102146 ] Hadoop QA commented on HDFS-6865: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662528/HDFS-6865.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestActiveStandbyElector org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.TestFSOutputSummer org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.server.datanode.TestHSync org.apache.hadoop.hdfs.TestBlocksScheduledCounter org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.server.datanode.TestTransferRbw org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot org.apache.hadoop.hdfs.server.namenode.TestFileLimit org.apache.hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestFileAppend org.apache.hadoop.hdfs.TestWriteRead org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.TestHFlush org.apache.hadoop.hdfs.TestGetBlocks org.apache.hadoop.hdfs.TestBlockReaderLocal org.apache.hadoop.hdfs.TestMultiThreadedHflush The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestBlockUnderConstruction org.apache.hadoop.hdfs.security.token.block.TestBlockToken {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7675//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7675//console This message is automatically generated. Byte array native checksumming on client side (HDFS changes) Key: HDFS-6865 URL: https://issues.apache.org/jira/browse/HDFS-6865 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-6865.patch Refactor FSOutputSummer to buffer data and use the native checksum calculation functionality introduced in HADOOP-10975. -- This message was sent by Atlassian JIRA (v6.2#6252)