[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115145#comment-14115145
 ] 

Hudson commented on HDFS-6865:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #663 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/663/])
HDFS-6865. Byte array native checksumming on client side. Contributed by James 
Thomas. (todd: rev ab638e77b811d9592470f7d342cd11a66efbbf0d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCrc32.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockUnderConstruction.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFs.java


 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115258#comment-14115258
 ] 

Hudson commented on HDFS-6865:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1854 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/])
HDFS-6865. Byte array native checksumming on client side. Contributed by James 
Thomas. (todd: rev ab638e77b811d9592470f7d342cd11a66efbbf0d)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockUnderConstruction.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCrc32.java


 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115430#comment-14115430
 ] 

Hudson commented on HDFS-6865:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1880 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1880/])
HDFS-6865. Byte array native checksumming on client side. Contributed by James 
Thomas. (todd: rev ab638e77b811d9592470f7d342cd11a66efbbf0d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockUnderConstruction.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NativeCrc32.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DataChecksum.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/token/block/TestBlockToken.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFs.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java


 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114221#comment-14114221
 ] 

stack commented on HDFS-6865:
-

bq, Stack, by any chance did you run any HBase correctness or system tests with 
the patch enabled? 

Ran a short test.IntegrationTestBigLinkedList on a small cluster. 

{code}
2014-08-28 12:18:30,126 INFO  [main] test.IntegrationTestBigLinkedList$Loop: 
Verify finished with succees. Total nodes=200
{code}

I'd imagine it would fail if we were misreading.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114602#comment-14114602
 ] 

Todd Lipcon commented on HDFS-6865:
---

Sweet. You the man, Stack! I'll commit this momentarily

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112100#comment-14112100
 ] 

Hadoop QA commented on HDFS-6865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664578/HDFS-6865.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.security.token.block.TestBlockToken

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7780//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7780//console

This message is automatically generated.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112511#comment-14112511
 ] 

Hadoop QA commented on HDFS-6865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664676/HDFS-6865.7.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7785//console

This message is automatically generated.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112573#comment-14112573
 ] 

stack commented on HDFS-6865:
-

I tried this patch on a small cluster doing TestHLogPE.  All seems to work same 
as w/o the patch; no obvious regression. This test is not good for seeing 
benefit of this patch being a single file write contended over by many threads 
appending and trying to sync as fast as they can. That said, comparing perf 
stats of tests where there was no native available to tip of branch-2 as of 
last night and then to branch-2+this patch, there is no discernible gain/loss 
with this patch in place.

Indirectly related, as a user, how would I know this improvement is in effect?  
Its 'on' all the time but what say, if native is not available, how as a user 
do I get a clue I'm missing out on nice checksum speedup? For myself, I hacked 
in logging into DFSClient so could confirm this patch was in effect. I could 
make a new issue to add this formally. Might help especially in the client case.



 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112595#comment-14112595
 ] 

James Thomas commented on HDFS-6865:


[~stack], thanks for taking a look. So you saw no effect when native was not 
available, right? And a new issue for the logging would be great. We can throw 
the log message into a constructor or something so it's only printed once.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112599#comment-14112599
 ] 

stack commented on HDFS-6865:
-

bq. So you saw no effect when native was not available, right? 

Not in my case.



 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112801#comment-14112801
 ] 

Hadoop QA commented on HDFS-6865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664682/HDFS-6865.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7786//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7786//console

This message is automatically generated.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112857#comment-14112857
 ] 

Todd Lipcon commented on HDFS-6865:
---

For the logging, this sounds like a good thing to put in HDFS-4486's 
PerformanceAdvisory log category.

Let me take a look at this latest patch.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112879#comment-14112879
 ] 

Todd Lipcon commented on HDFS-6865:
---

+1, looks good to me.

Stack, by any chance did you run any HBase correctness or system tests with the 
patch enabled? If that's easy to do, would be great to verify with another 
heavy user of HDFS that this doesn't break anything.

I'll commit this to trunk and branch-2 tomorrow if I don't hear anything to the 
contrary.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.6.patch, HDFS-6865.7.patch, HDFS-6865.8.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-26 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111085#comment-14111085
 ] 

Todd Lipcon commented on HDFS-6865:
---

Thanks for doing the diligence on the performance tests. Looks like this will 
be a good speedup across the board. A few comments:

- In the FSOutputSummer constructor, aren't checksumSize and maxChunkSize now 
redundant with the DataChecksum object that's passed in? {{checksumSize}} 
should be the same as {{sum.getChecksumSize()}} and {{maxChunkSize}} should be 
the same as {{sum.getBytesPerChecksum()}}, no?

- Similarly, in the FSOutputSummer class, it seems like the member variables of 
the same names are redundantr with the {{sum}} member variable.

- Can you mark {{sum}} as {{final}} in FSOutputSummer?

- Shouldn't BUFFER_NUM_CHUNKS be a multiple of 3, since we calculate three 
chunks worth in parallel in the native code? (worth a comment explaining the 
choice, too)



{code}
  private int write1(byte b[], int off, int len) throws IOException {
if(count==0  len=buf.length) {
  // local buffer is empty and user data has one chunk
  // checksum and output data
{code}

This comment is no longer accurate, right? The condition is now that the user 
data has provided data at least as long as our internal buffer.



- {{writeChecksumChunk}} should probably be renamed to {{writeChecksumChunks}} 
and its javadoc get updated.

- It's a little weird that you loop over {{writeChunk}} and pass a single chunk 
per call, though you actually have data ready for multiple chunks, and the API 
itself seems to be perfectly suitable to pass all of the chunks at once. Did 
you want to leave this as a later potential optimization?



{code}
  writeChunk(b, off + i, Math.min(maxChunkSize, len - i), checksum,
  i / maxChunkSize * checksumSize, checksumSize);
{code}

This code might be a little easier to read if you made some local variables:

{code}
  int rem = Math.min(maxChunkSize, len - i);
  int ckOffset = i / maxChunkSize * checksumSize;
  writeChunk(b, off + i, rem, checksum, ckOffset, checksumSize);
{code}



{code}
  /* Forces any buffered output bytes to be checksumed and written out to
   * the underlying output stream.  If keep is true, then the state of 
   * this object remains intact.
{code}

This comment is now inaccurate. If {{keep}} is true, then it retains only the 
last partial chunk worth of buffered data.



- The {{setNumChunksToBuffer}} static thing is kind of sketchy. What if, 
instead, you implemented flush() in FSOutputSummer such that it always flushed 
all completed chunks? (and not any partial last chunk). Then you could make 
those tests call flush() before checkFile(), and not have to break any 
abstractions?


 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-22 Thread James Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107404#comment-14107404
 ] 

James Thomas commented on HDFS-6865:


I ran the tests that [~tlipcon] suggested and have some results. I created 
buffers of various sizes and repeatedly wrote them using 
FSDataOutputStream.write(). For each buffer size, I also wrapped 
FSDataOutputStream with a BufferedOutputStream. I made sure the packet size and 
block sizes were large enough that no actual writes to DataNodes occurred, so 
the times shown here primarily cover data buffering and checksumming and packet 
construction on the client side.

The following times are all in milliseconds. Each test involved writing 8 MB of 
data to the stream. I only did one run for each of these data points, so there 
are a few unreproducible outliers (e.g. the 130ms in the 2^8 row), but the 
results are generally good enough that I didn't think averaging over a large 
number of runs was necessary.

Some interpretation of the results: Naturally the time goes down with bigger 
buffers since we have fewer instructions (less method call overhead) per byte. 
At smaller buffer sizes the time for the checksum becomes more and more 
negligible compared to the other overheads per byte (after all, the checksum is 
a handful of instructions per byte even
for the Java code), so we don't see much of a difference between the pre- and 
post-change code. The main case I was worried about was for input buffers (in 
the non-BufferedOuputStream case) larger than the original FSOutputSummer 
buffer (512 bytes) and smaller than the current FSOutputSummer buffer (5120 
bytes), because these incur a buffer copy in the new FSOutputSummer (since 
there is now space for them in the FSOutputSummer's buffer) but were sent 
directly to the DFSOutputStream (to be copied into a packet) in the old 
FSOutputStream. But the data shows that this case (rows 2^9 and 2^10) is not 
problematic -- clearly the extra buffer copies are offset by the time saved by 
faster checksumming.

||log(Buffer Size)||pre-change||pre-change w/ 
BufferedStream||post-change||post-change w/ BufferedStream|
|0|463|258|449|261|
|1|249|125|213|118|
|2|133|61|112|62|
|3|42|16|56|22|
|4|32|21|22|8|
|5|15|14|18|8|
|6|19|9|7|6|
|7|18|28|11|5|
|8|14|15|5|130|
|9|12|12|4|4|
|10|15|8|5|4|


 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107661#comment-14107661
 ] 

Hadoop QA commented on HDFS-6865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663708/HDFS-6865.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.web.TestWebHDFSAcl

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.security.token.block.TestBlockToken

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7723//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7723//console

This message is automatically generated.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.5.patch, HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106344#comment-14106344
 ] 

Hadoop QA commented on HDFS-6865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663419/HDFS-6865.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 3 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
org.apache.hadoop.hdfs.server.namenode.TestBlockUnderConstruction

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7707//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7707//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7707//console

This message is automatically generated.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.2.patch, HDFS-6865.3.patch, HDFS-6865.4.patch, 
 HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6865) Byte array native checksumming on client side (HDFS changes)

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102146#comment-14102146
 ] 

Hadoop QA commented on HDFS-6865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662528/HDFS-6865.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestActiveStandbyElector
  org.apache.hadoop.ha.TestZKFailoverController
  org.apache.hadoop.hdfs.TestDFSClientRetries
  org.apache.hadoop.hdfs.TestFSOutputSummer
  org.apache.hadoop.hdfs.TestFileConcurrentReader
  org.apache.hadoop.hdfs.server.datanode.TestHSync
  org.apache.hadoop.hdfs.TestBlocksScheduledCounter
  
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.server.datanode.TestTransferRbw
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot
  org.apache.hadoop.hdfs.server.namenode.TestFileLimit
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot
  org.apache.hadoop.hdfs.TestFileAppend3
  org.apache.hadoop.hdfs.TestFileCreation
  org.apache.hadoop.hdfs.TestFileAppend
  org.apache.hadoop.hdfs.TestWriteRead
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestHFlush
  org.apache.hadoop.hdfs.TestGetBlocks
  org.apache.hadoop.hdfs.TestBlockReaderLocal
  org.apache.hadoop.hdfs.TestMultiThreadedHflush

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.TestBlockUnderConstruction
org.apache.hadoop.hdfs.security.token.block.TestBlockToken

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7675//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7675//console

This message is automatically generated.

 Byte array native checksumming on client side (HDFS changes)
 

 Key: HDFS-6865
 URL: https://issues.apache.org/jira/browse/HDFS-6865
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-6865.patch


 Refactor FSOutputSummer to buffer data and use the native checksum 
 calculation functionality introduced in HADOOP-10975.



--
This message was sent by Atlassian JIRA
(v6.2#6252)