[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-11-30 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031939#comment-15031939
 ] 

Daryn Sharp commented on HDFS-7435:
---

[~liuml07], good catch!  Not sure how I accidentally did that and I don't know 
why the test still passes.  I think your proposed fix is correct.  I know 
extremely little of the benchmark tool so I'd be unable to add a test (for the 
test!) in a timely manner, so if you'd like the credit for the bug then I'm 
fine with you doing the jira.

> PB encoding of block reports is very inefficient
> 
>
> Key: HDFS-7435
> URL: https://issues.apache.org/jira/browse/HDFS-7435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-11-30 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032285#comment-15032285
 ] 

Mingliang Liu commented on HDFS-7435:
-

Thanks for your comment, [~daryn] and [~shv]. I started to work on 
{{NNThroughputBenchmark}} just recently and knew nothing about it before that. 
If the empty block report list in this patch was not intensional, I think I 
don't need more context of this patch. Sure I'd like to make the change as we 
discussed above. The jira is [HDFS-9484]. Let's continue further discussion on 
the fix there.

The reason why the unit tests could pass may be that the 
{{TestNNThroughputBenchmark}} is rather a driver to run the benchmark with 
default parameters than a real unit test that asserts expected behavior for 
different scenarios. If we need a sophisticated unit test, perhaps we can 
address it separately.

> PB encoding of block reports is very inefficient
> 
>
> Key: HDFS-7435
> URL: https://issues.apache.org/jira/browse/HDFS-7435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-11-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027910#comment-15027910
 ] 

Konstantin Shvachko commented on HDFS-7435:
---

I agree this broke {NNThroughputBenchmark.BlockReportStats}} operation as it 
makes DNs always send an empty block report no matter what the parameters of 
the benchmark are.
Good catch Mingliang. Let's file a jira to fix this. [~daryn] are you available 
to fix this?

> PB encoding of block reports is very inefficient
> 
>
> Key: HDFS-7435
> URL: https://issues.apache.org/jira/browse/HDFS-7435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-11-24 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025772#comment-15025772
 ] 

Mingliang Liu commented on HDFS-7435:
-

In {{NNThroughputBenchmark}} class, the change
{code}
 void formBlockReport() {
   // fill remaining slots with blocks that do not exist
-  for(int idx = blocks.size()-1; idx >= nrBlocks; idx--)
-blocks.set(idx, new Block(blocks.size() - idx, 0, 0));
-  blockReportList = new BlockListAsLongs(blocks).getBlockListAsLongs();
+  for (int idx = blocks.size()-1; idx >= nrBlocks; idx--) {
+Block block = new Block(blocks.size() - idx, 0, 0);
+blocks.set(idx, new BlockReportReplica(block));
+  }
+  blockReportList = BlockListAsLongs.EMPTY;
{code}

Why the last line is not building the {{blockReportList}} using {{blocks}} 
field as following?
{code}
  blockReportList = BlockListAsLongs.encode(blocks);
{code}

I may need more context to understand this patch, while my goal is to work on 
{{NNThroughputBenchmark}}. Thanks.

> PB encoding of block reports is very inefficient
> 
>
> Key: HDFS-7435
> URL: https://issues.apache.org/jira/browse/HDFS-7435
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361722#comment-14361722
 ] 

Hudson commented on HDFS-7435:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #132 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/132/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361743#comment-14361743
 ] 

Hudson commented on HDFS-7435:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #866 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/866/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361818#comment-14361818
 ] 

Hudson commented on HDFS-7435:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #123 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/123/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361829#comment-14361829
 ] 

Hudson commented on HDFS-7435:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #132 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/132/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361850#comment-14361850
 ] 

Hudson commented on HDFS-7435:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2082 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2082/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361804#comment-14361804
 ] 

Hudson commented on HDFS-7435:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2064 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2064/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360800#comment-14360800
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704194/HDFS-7435.patch
  against trunk revision 387f271.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9874//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9874//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360878#comment-14360878
 ] 

Kihwal Lee commented on HDFS-7435:
--

+1 I also ran TestDatanodeManager multiple times on my machine with this patch 
without any failure.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-13 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360854#comment-14360854
 ] 

Daryn Sharp commented on HDFS-7435:
---

The retry cache test failure is HDFS-7524.  I cannot reproduce the DN manager 
timeout, and it's impossible to tell from pre-commit why it failed (no logs).  
The test normally runs in ~10s, but takes 3min during a prior successful run so 
maybe the build machine is overwhelmed and on the edge of timing out.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360982#comment-14360982
 ] 

Hudson commented on HDFS-7435:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7318 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7318/])
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by 
Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359031#comment-14359031
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704194/HDFS-7435.patch
  against trunk revision 06ce1d9.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9853//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-11 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357652#comment-14357652
 ] 

Daryn Sharp commented on HDFS-7435:
---

Tests are passing for me.  The grumbling about edit log corruption should have 
nothing to do with this patch.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357666#comment-14357666
 ] 

Jing Zhao commented on HDFS-7435:
-

Thanks for updating the patch, Daryn. The latest patch looks good to me, and I 
think the new capabilities field in NamespaceInfo is a good idea. Some minors:
# Need to fix javadoc for {{decodeBuffer}}, {{encode}}, and {{getBlocksBuffer}} 
in BlockListAsLongs.
# Here older I guess you mean the current version compared with a future 
version. Maybe we should make it more clear in the comment.
{code}
// reserve upper bits for future use. decoding masks off these bits to
// allow compatibility for older namenodes
private static long NUM_BYTES_MASK = (-1L)  (64 - 48);
private static long REPLICA_STATE_MASK = (-1L)  (64 - 4);
{code}
# Looks like the following code can be simplified and we actually do not need 
isSupported?
{code}
+Capability(boolean isSupported) {
+  int bits = ordinal() - 1;
+  mask = (bits  0) ? 0 : (1L  bits);
+  if (isSupported) {
+CAPABILITIES_SUPPORTED |= mask;
+  }
+}
{code}

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357484#comment-14357484
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703930/HDFS-7435.patch
  against trunk revision 30c428a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9838//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9838//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357719#comment-14357719
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704033/HDFS-7435.patch
  against trunk revision 7a346bc.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9843//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357734#comment-14357734
 ] 

Jing Zhao commented on HDFS-7435:
-

nit-pick:
# s/blocksBuffer/blocksBuf/ in the javadoc
{code}
  /**
   * Prepare an instance to in-place decode the given ByteString buffer
   * @param numBlocks - blocks in the buffer
   * @param blocksBuffer - ByteString encoded varints
   * @return BlockListAsLongs
   */
  public static BlockListAsLongs decodeBuffer(final int numBlocks,
  final ByteString blocksBuf) {
return new BufferDecoder(numBlocks, blocksBuf);
  }
{code}

+1 after fixing it.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350769#comment-14350769
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702905/HDFS-7435.patch
  against trunk revision 95bfd08.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
  org.apache.hadoop.hdfs.TestSetrepIncreasing
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  org.apache.hadoop.hdfs.server.datanode.TestSimulatedFSDataset

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9776//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9776//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-04 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347165#comment-14347165
 ] 

Daryn Sharp commented on HDFS-7435:
---

Thanks Jing  Charles.  Addressing comments, will post later today.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-03 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345311#comment-14345311
 ] 

Charles Lamb commented on HDFS-7435:


Hi @daryn,

The new patch looks pretty good to me. Just a few nits.

FsDatasetImpl still has one line that exceeds the 80 chars, and there are a 
couple of unused imports in the new test TestBlockListAsLongs. Also in that 
test, IWBNI you could use the specific Mockito static imports that are needed 
rather than the * import.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-03-03 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345859#comment-14345859
 ] 

Jing Zhao commented on HDFS-7435:
-

Thanks for working on this, Daryn. The patch looks good to me in general. Some 
comments below:
# Please add javadoc for BlockListAsLongs.
# I think we'd better not include the replica state for finalized replicas. 
Though it is encoded using a varint, considering most of the blocks are 
finalized in the report, this can still increase the total size of the report. 
How about not including the finalized state and write an extra number in the 
header to indicate the total number of finalized replicas?
# For {{BlockReportLongIterator}}, it can be more clean to only keep either 
{{iter}} or {{currentBlockIndex}}.
# Now the iterator becomes an internal field of a BlockListAsLongs. I guess the 
main motivation here is to support both new and old encoding but avoid having 
both a ByteString and a long[] inside of BlockListsAsLongs? But in the 
meanwhile, this means there can be only one user of {{iterator()}} in the whole 
life time of BlockListAsLongs? This usage pattern is true in the current code, 
but is it good for BlockListAsLongs as a pure data structure?
# Also in {{getBlockReportIterator}}, if {{iter}} is null, we're actually 
creating a new {{BlockListAsLongs}} instance and returning its iterator. Though 
this may not be hit by normal code path, this may not be very clean.
# To address the above two comments, maybe what we can do is:
1) declare both the ByteString and the long[] in BlockListAsLongs and build 
iterator on top
2) or keep the old BlockListAsLongs class (but annotate it as deprecated) and 
use it for handling the old style encoding. In this way we can have a more 
clean new BlockListAsLongs, and we can delete the old one in the future.





 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340309#comment-14340309
 ] 

Daryn Sharp commented on HDFS-7435:
---

[~sureshms], [~jingzhao], could you please review this morning so we may 
hopefully incorporate this into our builds?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340664#comment-14340664
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701376/HDFS-7435.patch
  against trunk revision a979f3b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9683//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9683//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9683//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340952#comment-14340952
 ] 

Jing Zhao commented on HDFS-7435:
-

Thanks for updating the patch, Daryn. I will review the patch.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341012#comment-14341012
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701436/HDFS-7435.patch
  against trunk revision 01a1621.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9686//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9686//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338768#comment-14338768
 ] 

Charles Lamb commented on HDFS-7435:


@daryn,

This looks really good. I like the new approach and your current patch does a 
pre-emptive strike on several of the comments I was going to make on the .002 
patch.

I really only have nits.

The patch needs to be rebased. There was one .rej when I applied it (obviously 
I worked past that for my review).

BlockListAsLongs.java:

BlockListAsLongs(Collection) needs an @param for the javadoc.

In #BlockListAsLongs(Collection), the ReplicaState is being written as a 
varint64. I realize it's a varint, but since it's really only a single byte in 
the implementation, it seems a little heavy handed to write it to the cos as a 
varint64. I also realize that it will need to be a long on the way back out for 
the uc long[]. If you don't want to change it from being a varint64 in the cos, 
then perhaps just add a comment saying that you know it's a byte (actually int) 
in the impl but for consistency you're using a varint64?

Since you throw UnsupportedOperationException from multiple #remove methods, 
you might want to add the class name to the message. e.g. Sorry. remove not 
implemented for BlockReportListIterator. Along a similar vein, would it be 
appropriate to add a message to BlockReportReplica.getVisibleLength, 
getStorageUuid, and isOnTransientStorage's UnsupportedOperationException?

BlockReportTestBase.java:

getBlockReports has one line that exceeds the 80 char width.

DatanodeProtocolClientSideTranslatorPB.java:

the import of NameNodeLayoutVersion is unused.

the decl of useBlocksBuffer busts the 80 char width.

DatanodeProtocolServerSideTranslatorPB.java:

import ByteString is unused.

FsDatasetImpl.java:

in #getBlockReports, the line under case RUR busts the 80 char limit.

NameNodeLayoutVersion.java:

Perhaps s/Protobuf optimized/Optimized protobuf/

NNThroughputBenchmark.java:

Thanks for fixing the formatting in here.

TestBlockHasMultipleReplicasOnSameDN.java:

blocks.add(... busts the 80 char limit.



 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-26 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339390#comment-14339390
 ] 

Charles Lamb commented on HDFS-7435:


bq. I mildly disagree will overly verbose message for 
UnsupportedOperationExceptions since the JDK rarely uses message and the class 
 method is in the stack trace.

Yeah, I figured there was a reason for not having a message in the UOE, so it 
won't cause me any heartburn if you don't put them in.

bq. I think the whole fragmented over-the-write buffer implementation is over 
engineered.

Oh, I kind of liked it.

I'll try to take a look at the new patch soon. Thanks for the reply.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339623#comment-14339623
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701196/HDFS-7435.patch
  against trunk revision bfbf076.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  org.apache.hadoop.hdfs.server.namenode.TestFSImage

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9673//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9673//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9673//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337507#comment-14337507
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700906/HDFS-7435.patch
  against trunk revision caa42ad.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9668//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335212#comment-14335212
 ] 

Jing Zhao commented on HDFS-7435:
-

Thanks for sharing the thoughts, [~daryn]. Why not posting your current patch 
first so that we can also have a better understanding about why bumping the 
DN's min NN version is necessary?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335128#comment-14335128
 ] 

Daryn Sharp commented on HDFS-7435:
---

Do I have the luxury of bumping the DN's minimum NN version?  That would 
greatly simplify the implementation.  It's easy for the NN to use the presence 
of the protobuf fields to determine if old/new.  However the prior patches 
illustrate it's not so easy for the DN to auto-detect.

I believe the standard upgrade procedure is upgrade the NN, then rolling 
upgrade the DNs.  Per above, upgraded NN supports old/new reports from DNs.  
The only scenario in which a problem can occur is the cluster is fully or 
partially upgraded, and the NN is downgraded.  The new DNs won't be able to 
communicate with the old NN, hence why I'd like to bump the minimum version so 
the DN doesn't continue to send block reports that appear to be empty to the 
old NN.  I'd argue that if the NN is downgraded, there's going to be downtime, 
so you might as well rollback the DNs too.

Thoughts?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-23 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333580#comment-14333580
 ] 

Daryn Sharp commented on HDFS-7435:
---

Almost about to post a zero-copy block report impl via iterators so we don't 
need to worry about segmenting at all.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-20 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329090#comment-14329090
 ] 

Daryn Sharp commented on HDFS-7435:
---

Let me get back to this patch this afternoon.  I'll post perf numbers too.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329107#comment-14329107
 ] 

Suresh Srinivas commented on HDFS-7435:
---

[~daryn], can you please comment on [~jingzhao]'s patch and why it does not 
address the issue? It would be great to get this done early. 

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328566#comment-14328566
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699754/HDFS-7435.002.patch
  against trunk revision c0d9b93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHDFS
  org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9625//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9625//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-19 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328256#comment-14328256
 ] 

Suresh Srinivas commented on HDFS-7435:
---

[~jingzhao], if you have a patch, can you please post it? In clusters with lot 
of small files and clusters with high density nodes, we have already seen 
issues related to large block reports. We can improve upon your patch in 
subsequent jiras. 

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-19 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328269#comment-14328269
 ] 

Jing Zhao commented on HDFS-7435:
-

Yeah, the 002 patch is the new rebased patch.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328443#comment-14328443
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699754/HDFS-7435.002.patch
  against trunk revision d49ae72.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestMiniDFSCluster
  org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
  org.apache.hadoop.hdfs.TestBalancerBandwidth
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9622//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9622//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247407#comment-14247407
 ] 

Daryn Sharp commented on HDFS-7435:
---

Apologies for delay, I've been dealing with production issues, including block 
reports which are becoming a very big issue.

I've studied the way PB is implemented and I'm not sure fragmenting the buffer 
will add value.  Proper encoding (mine is buggy) will not allocate a full 
buffer but uses a tree to hold fragments of the byte string.  Decoding the PB 
will use the full byte array of the PB as the backing store for slices (ref to 
full array + offset/length).  We've already paid the price for a large 
allocation of the full PB, carried it around in the Call object, etc, so 
extracting the field is essentially free.  Whether it's one or more is 
irrelevant.

I'm trying to performance test a patch that internally segments the 
BlockListAsLongs and correctly outputs the byte buffer.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-03 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233210#comment-14233210
 ] 

Kihwal Lee commented on HDFS-7435:
--

bq. This is important because the problems of large contiguous array is also a 
problem for DataNode, since there are deployments that use 60 disks in a single 
node with more than 10 million blocks in a single DataNode.

Just to be clear, the datanode and namenode require the same size of contiguous 
memory for the backing arrays today for the ArrayLists used internally by 
protobuf.  So, this patch is not making the situation any worse.  But I agree 
that chunking is a good idea and is better to be done along with this feature.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-03 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233242#comment-14233242
 ] 

Daryn Sharp commented on HDFS-7435:
---

I agree that it makes sense for both the DN and NN to chunk the block list for 
their internal data structures.  All I'm saying is that the wire protocol may 
not need to be built around that implementation detail - but I think a middle 
ground is { block-count, chunk-count, blocks[], chunk-count, blocks[], ... }.  
However I think it can be done in a simpler and less invasive fashion.  I'll 
toss another patch up this afternoon.

On a side note: we're having some DN OOM issues too during block reports + 
heavy load.  In your use case, I'm presuming each disk is a storage.   If yes, 
each unencoded storage report will be ~4MB, ~1.5MB encoded.  The DN heap must 
be 64GB, easily more for HA, so a handful of MB appears to pales in comparison 
to how much memory the DN wastes building the reports.  Again, there are 
default capacity {{ArrayLists}} to build the finalized  UC lists!  I'm going 
to make minimal changes to reduce the gross inefficiency here too.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-03 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233280#comment-14233280
 ] 

Suresh Srinivas commented on HDFS-7435:
---

[~kihwal], agree with you. We are on the same page.

[~daryn], some comments:
bq. All I'm saying is that the wire protocol may not need to be built around 
that implementation detail - but I think a middle ground is
{ block-count, chunk-count, blocks[], chunk-count, blocks[], ... }
Have you looked at Jing's patch? Can you post comments on why it cannot be 
enhanced to add chunk-count? I believe its close to being ready with additional 
changes? Is there a need to do a branch new patch?

bq. . In your use case, I'm presuming each disk is a storage. If yes, each 
unencoded storage report will be ~4MB, ~1.5MB encoded. The DN heap must be 
64GB, easily more for HA, so a handful of MB appears to pales in comparison to 
how much memory the DN wastes building the reports. 
I do not follow these numbers well. BTW for very large nodes ( 200 TB), 
Datanodes can be run with just 8G heap space. Not sure about the  64GB number 
you are mentioning.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230068#comment-14230068
 ] 

Daryn Sharp commented on HDFS-7435:
---

Apologies if I wasn't clear.  My original patch added a blocksBuffer PB field 
with encoding of size : long[].   The demo patch for chunking changes this new 
field into a repeating field.  The resulting encoding is chunk-size : long[], 
chunk-size long[], etc.

I'm fine with chunking, but I don't think the tail should wag the dog.  The 
in-memory representation of the blocks list is merely an implementation detail 
that should not influence the PB encoding.  Ie. Leave blocksBuffer as a 
non-repeating field, and let the NN chunk the block list however it sees fit.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-12-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230191#comment-14230191
 ] 

Suresh Srinivas commented on HDFS-7435:
---

bq. The in-memory representation of the blocks list is merely an implementation 
detail that should not influence the PB encoding. Ie. Leave blocksBuffer as a 
non-repeating field, and let the NN chunk the block list however it sees fit.
I think the chunked buffers should be sent all the way from DataNode to the 
NameNode. NameNode can store it in a different way that fits it needs (though 
if it is chunked, I do not see NameNode storing it differently). This is 
important because the problems of large contiguous array is also a problem for 
DataNode, since there are deployments that use 60 disks in a single node with 
more than 10 million blocks in a single DataNode.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-26 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226076#comment-14226076
 ] 

Yi Liu commented on HDFS-7435:
--

I open HADOOP-11339 to reuse buffer for Hadoop RPC as stated in my above 
comment. 

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-26 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226276#comment-14226276
 ] 

Daryn Sharp commented on HDFS-7435:
---

Upon quick scan, it appears the the chunk size is encoded in the buffer?  If 
yes, I think each side should be able chose an appropriate chunk size - as an 
implementation detail - which is not leaked into the PB encoding.  If we can 
agree on that point, can we move the segmentation to a separate jira?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-26 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226567#comment-14226567
 ] 

Jing Zhao commented on HDFS-7435:
-

In the current demo patch the max chunk size is not encoded in the buffer. It 
is currently simply determined by the DataNode based on configuration. The 
segmentation is done by the DataNode and NameNode does not need to know the max 
chunk size. For simplicity each chunk still follows the same format with the 
original long[] in BlockListAsLongs (i.e., it still encodes the number of 
finalized blocks and number of uc replicas in the first two elements).

I guess to let DataNode be the only side doing segmentation can avoid NameNode 
still allocating a big contiguous array before chunking. Thus I have to change 
the {{optional bytes blocksBuffer}} into {{repeated bytes blocksBuffers}}. 
Maybe we can use {{repeated bytes blocksBuffers}} here but assume the number of 
buffer is always 1, then move the real segmentation change into a separate jira?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226975#comment-14226975
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683918/HDFS-7435.001.patch
  against trunk revision 8f1454c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDFSClientRetries

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8851//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8851//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-25 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224638#comment-14224638
 ] 

Daryn Sharp commented on HDFS-7435:
---

We're definitely on the same page.  I meant questionably in the sense that the 
garbage generation rate is so high that CMS (you mention fragmentation) will be 
slow, bloated with free lists, and not keeping up.  Now granted, a 20M 
allocation is likely to be prematurely tenured - along with the IPC protobuf 
containing the large report.  One solution/workaround to both is reducing the 
size of the individual BR via multiple storages.  The storages don't have to be 
individual drives but just subdirs.

Segmenting shouldn't be an outright replacement.  The decode will emit a 
long[][] which requires updating {{StorageBlockReport}} and 
{{BlockListAsLongs}}.  Similar changes to the datanode, although not required 
for the namenode changes, will be more complex.  I already considered 
segmenting.  :)  I found the complexity vs time vs benefit to be much lower 
than improvements to other subsystems.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225666#comment-14225666
 ] 

Yi Liu commented on HDFS-7435:
--

Hi guys, I think this is a good improvement.
I also find a similar issue related to this when I'm doing Hadoop RPC related 
optimization in my local branch.
As we all known that the blocks report from DNs may become very large in big 
cluster, and there is chance to cause full GC if there is no enough contiguous 
space in old generation.

We know that we will reuse the connection for RPC calls, but when we process 
each rpc in the same connection, we will allocate a fresh heap byte buffer to 
store the rpc bytes data. The rpc message may be very large. So it will cause 
the same issue. 
My thought is to reuse the data buffer in the connection, I will open a new 
JIRA to track it.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225821#comment-14225821
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683743/HDFS-7435.000.patch
  against trunk revision a655973.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestStartup
  org.apache.hadoop.hdfs.server.namenode.TestFileLimit
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus
  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
  org.apache.hadoop.fs.TestSymlinkHdfsFileContext
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
  org.apache.hadoop.hdfs.TestSetrepDecreasing
  
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement
  
org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks
  org.apache.hadoop.hdfs.TestReplication
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
  org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
  
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  org.apache.hadoop.hdfs.server.namenode.TestHostsFiles
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
  org.apache.hadoop.hdfs.server.namenode.TestINodeFile
  org.apache.hadoop.hdfs.TestEncryptedTransfer

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend
org.apache.hadoop.hdfs.server.namenode.TestFsck

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8842//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8842//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223314#comment-14223314
 ] 

Jing Zhao commented on HDFS-7435:
-

Can we also use some blocked arrays (and each block with a size limit) instead 
of a contiguous array for the full block report? In practice we've seen big 
full block reports cause NameNode gc. This can be similar idea with HDFS-4879.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223337#comment-14223337
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683375/HDFS-7435.patch
  against trunk revision 555fa2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8821//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8821//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223437#comment-14223437
 ] 

Daryn Sharp commented on HDFS-7435:
---

We've also seen large block reports aggravate GC, not due to decoded size, but 
due to the spew of garbage from the re-allocs and (un)boxing.  This patch is 
basically taking us back to pre-PB performance  behavior.

The block report is ~44,000 blocks/MB which at our scale is a few megs per 
report.  HDFS-4879 chunked a list of blocks to delete because of 25M entries 
consuming ~400MB.  I don't foresee blocks/node approaching that level anytime 
soon.  In that light, it's probably premature to introduce chunking?


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223560#comment-14223560
 ] 

Jing Zhao commented on HDFS-7435:
-

I've seen some cluster generating block report each containing 1000,000 
blocks, i.e., 20~30MB. Considering we're handling block reports from multiple 
hundred DataNodes I guess this can still cause issue but I'm not sure if this 
is the main cause of the gc in that cluster. But I agree we can make the 
current optimization first and do chunking separately if necessary.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223574#comment-14223574
 ] 

Suresh Srinivas commented on HDFS-7435:
---

[~jingzhao] and [~daryn], I think chunking the block reports into a set of 
arrays is a good change to avoid possible GC promotion failures due to not 
finding contiguous memory in old generation heap. We should consider doing that 
along with this change to avoid unnecessary layout change.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223757#comment-14223757
 ] 

Daryn Sharp commented on HDFS-7435:
---

While I agree it's a questionably nice to have feature - see below - decoding 
the block report into a contiguous or non-contiguous array is an implementation 
detail that can be changed in another jira w/o updating the layout version.

This patch simply undoes the damage caused by moving from Writables to PBs.   I 
changed it internally from long[] - dynamically growing ArrayListLong - PB 
- dynamically growing ArrayListLong - long[]; to simply fixed long[] - PB 
- fixed long[].   Further changes to segment the list in ways not previously 
done are feature creep.

If 20-30MB is going to cause a promotion failure in a namenode servicing a 
hundred millions of blocks - it's already game over.  2.x easily generates over 
1GB garbage/sec at a mere ~20k ops/sec.  That's an average of 54MB/request.  
This is why I'm submitting GC friendly changes.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-24 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223815#comment-14223815
 ] 

Suresh Srinivas commented on HDFS-7435:
---

bq. While I agree it's a questionably nice to have feature
Not sure why you think it is questionably nice to have feature...

bq. If 20-30MB is going to cause a promotion failure in a namenode servicing a 
hundred millions of blocks - it's already game over. 2.x easily generates over 
1GB garbage/sec at a mere ~20k ops/sec.
Just so that we are on the same page, java arrays requires contiguous arrays in 
memory. In many installs, when namenode becomes unresponsive and datanodes end 
up sending block reports, these block reports get promoted to older generation 
(because namenode is processing them slowly). Since old generation may be 
fragmented, promotions can fail when large arrays need to be promoted.

That said, I am fine doing it in a subsequent jira. It will end up touching the 
same parts or replacing the code that you are adding.


 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)