date:20140922

[jira] [Commented] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144432#comment-14144432
 ] 

Hadoop QA commented on HDFS-6581:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670595/HDFS-6581.merge.10.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8159//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8159//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8159//console

This message is automatically generated.

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, 
> HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6606) Optimize HDFS Encrypted Transport performance

2014-09-22 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-6606:
-
Attachment: HDFS-6606.006.patch

Rebase the patch for latest trunk again.

> Optimize HDFS Encrypted Transport performance
> -
>
> Key: HDFS-6606
> URL: https://issues.apache.org/jira/browse/HDFS-6606
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, security
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, 
> HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, 
> HDFS-6606.006.patch, OptimizeHdfsEncryptedTransportperformance.pdf
>
>
> In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, 
> it was a great work.
> It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf),  it supports 
> three security strength:
> * high  3des   or rc4 (128bits)
> * medium des or rc4(56bits)
> * low   rc4(40bits)
> 3des and rc4 are slow, only *tens of MB/s*, 
> http://www.javamex.com/tutorials/cryptography/ciphers.shtml
> http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/
> I will give more detailed performance data in future. Absolutely it’s 
> bottleneck and will vastly affect the end to end performance. 
> AES(Advanced Encryption Standard) is recommended as a replacement of DES, 
> it’s more secure; with AES-NI support, the throughput can reach nearly 
> *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is 
> supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add 
> a new mode support for AES). 
> This JIRA will use AES with AES-NI support as encryption algorithm for 
> DataTransferProtocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144355#comment-14144355
 ] 

Hadoop QA commented on HDFS-7132:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670582/HDFS-7132.001.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8158//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8158//console

This message is automatically generated.

> hdfs namenode -metadataVersion command does not honor configured name dirs
> --
>
> Key: HDFS-7132
> URL: https://issues.apache.org/jira/browse/HDFS-7132
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7132.001.patch
>
>
> The hdfs namenode -metadataVersion command does not honor 
> dfs.namenode.name.dir.. configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6881) The DFSClient should use the sampler to determine whether to initiate trace spans when making RPCv9 calls to the NN and DN

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144333#comment-14144333
 ] 

Hadoop QA commented on HDFS-6881:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670565/HDFS-6881.003.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1270 javac 
compiler warnings (more than the trunk's current 1264 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ipc.TestProtoBufRpc
  org.apache.hadoop.ipc.TestMultipleProtocolServer
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.hdfs.TestHDFSFileSystemContract
  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
  org.apache.hadoop.hdfs.TestBlockStoragePolicy
  org.apache.hadoop.hdfs.TestRollingUpgrade
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
  
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
  
org.apache.hadoop.hdfs.server.namenode.TestEditLogJournalFailures
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot
  org.apache.hadoop.hdfs.TestFileCreationClient
  org.apache.hadoop.hdfs.TestStoragePolicyCommands
  org.apache.hadoop.hdfs.server.namenode.TestAddBlock
  org.apache.hadoop.fs.TestSymlinkHdfsDisable
  org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream
  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithAcl
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.TestFileAppend4
  org.apache.hadoop.hdfs.qjournal.server.TestJournalNode
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
  org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade
  org.apache.hadoop.security.TestPermission
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
  org.apache.hadoop.hdfs.web.TestWebHDFSXAttr
  org.apache.hadoop.hdfs.TestDFSClientFailover
  org.apache.hadoop.hdfs.TestModTime
  org.apache.hadoop.hdfs.TestDistributedFileSystem
  org.apache.hadoop.hdfs.web.TestWebHDFS
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot
  org.apache.hadoop.hdfs.TestDFSUpgrade
  org.apache.hadoop.security.TestPermissionSymlinks
  org.apache.hadoop.hdfs.TestAbandonBlock
  org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen
  org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap
  org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA
  org.apache.hadoop.hdfs.TestLeaseRecovery2
  org.apache.hadoop.hdfs.TestSnapshotCommands
  org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl
  org.apache.hadoop.hdfs.server.namenode.ha.TestHAFsck

[jira] [Updated] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-22 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7128:
--
Status: Patch Available  (was: Open)

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7128.patch
>
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-22 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7128:

Assignee: Ming Ma

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7128.patch
>
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-22 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7128:
--
Attachment: HDFS-7128.patch

First of all, we need to clarify the replication policy for decomm, finish the 
decomm ASAP by spreading the source nodes to all replicas, or let decomm nodes 
be the only source nodes for replication?

BlockManager's current replication policy sort of distributes the load to all 
replicas for replication. We can argue that is the expected behavior, treat 
decomm similar to dead node scenario, replicate blocks ASAP.

The initial patch addresses the issue where decomm nodes' pending replication 
queue can be quite large and its impact on certain blocks' replication. It 
doesn't change the current ASAP replication policy for decomm scenario.

Appreciate any input on this.


> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: HDFS-7128.patch
>
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6956) Allow dynamically changing the tracing level in Hadoop servers

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144294#comment-14144294
 ] 

Hadoop QA commented on HDFS-6956:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670549/HDFS-6956.004.patch
  against trunk revision 43efdd3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1280 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.crypto.random.TestOsSecureRandom
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.ipc.TestMultipleProtocolServer
  org.apache.hadoop.ha.TestActiveStandbyElectorRealZK
  org.apache.hadoop.ipc.TestProtoBufRpc
  org.apache.hadoop.hdfs.TestReservedRawPaths
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots
  org.apache.hadoop.hdfs.TestModTime
  org.apache.hadoop.hdfs.security.TestDelegationToken
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.server.namenode.TestFileLimit
  
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
  org.apache.hadoop.cli.TestCryptoAdminCLI
  org.apache.hadoop.hdfs.TestDFSClientRetries
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete
  org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl
  org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
  org.apache.hadoop.hdfs.server.namenode.TestFSDirectory
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing
  org.apache.hadoop.hdfs.TestFileStatus
  org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
  org.apache.hadoop.hdfs.TestReadWhileWriting
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractMkdir
  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
  org.apache.hadoop.hdfs.server.namenode.TestAuditLogger
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat
  
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer
  org.apache.hadoop.hdfs.server.namenode.TestAddBlockRetry
  org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
  org.apache.hadoop.fs.TestSymlinkHdfsFileContext
  org.apache.hadoop.hdfs.web.TestWebHDFSXAttr
  org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol
  org.apache.hadoop.cli.TestAclCLI
  org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap
  org.apache.hadoop.hdfs.TestFileAppend
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.web.TestWebHDFSAcl
  org.apache.hadoop.hdfs.server.namenode.TestFileContextXAttr
  org.apache.hadoop.hdfs.TestAbandonBlock
  org.apache.hadoop.hdfs.server.namenode.TestAclConfigFlag
  org.apache.hadoop.hdfs.TestFileCreationClient
  org.apache.hadoop.hdfs.qjournal.server.TestJournalNode
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithXAttr
  org.apache.hadoop.hdfs.TestDFSMkdirs
  org.apache.hadoop.hdfs.TestFileAppendRestart
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot

[jira] [Commented] (HDFS-6881) The DFSClient should use the sampler to determine whether to initiate trace spans when making RPCv9 calls to the NN and DN

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144293#comment-14144293
 ] 

Hadoop QA commented on HDFS-6881:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670368/HDFS-6881.002.patch
  against trunk revision 43efdd3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1291 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.crypto.random.TestOsSecureRandom
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.ipc.TestMultipleProtocolServer
  org.apache.hadoop.ipc.TestProtoBufRpc
  org.apache.hadoop.hdfs.TestReservedRawPaths
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots
  org.apache.hadoop.hdfs.TestModTime
  org.apache.hadoop.fs.TestUrlStreamHandler
  org.apache.hadoop.hdfs.security.TestDelegationToken
  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.server.namenode.TestFileLimit
  
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
  org.apache.hadoop.cli.TestCryptoAdminCLI
  org.apache.hadoop.hdfs.TestDFSClientRetries
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete
  org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl
  org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
  org.apache.hadoop.hdfs.server.namenode.TestFSDirectory
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing
  org.apache.hadoop.hdfs.TestFileStatus
  org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
  org.apache.hadoop.hdfs.TestReadWhileWriting
  org.apache.hadoop.fs.contract.hdfs.TestHDFSContractMkdir
  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
  org.apache.hadoop.hdfs.server.namenode.TestAuditLogger
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat
  
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer
  org.apache.hadoop.hdfs.server.namenode.TestAddBlockRetry
  org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
  org.apache.hadoop.fs.TestSymlinkHdfsFileContext
  org.apache.hadoop.hdfs.web.TestWebHDFSXAttr
  org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol
  org.apache.hadoop.cli.TestAclCLI
  org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap
  org.apache.hadoop.hdfs.TestFileAppend
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.web.TestWebHDFSAcl
  org.apache.hadoop.hdfs.server.namenode.TestFileContextXAttr
  org.apache.hadoop.hdfs.TestAbandonBlock
  org.apache.hadoop.hdfs.server.namenode.TestAclConfigFlag
  org.apache.hadoop.hdfs.TestFileCreationClient
  org.apache.hadoop.hdfs.qjournal.server.TestJournalNode
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithXAttr
  org.apache.hadoop.hdfs.TestDFSMkdirs
  org.apache.hadoop.hdfs.TestFileAppendRestart
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot
  org.apac

[jira] [Commented] (HDFS-1258) Clearing namespace quota on "/" corrupts FS image

2014-09-22 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144289#comment-14144289
 ] 

Guo Ruijing commented on HDFS-1258:
---

create HDFS-7133 to support clearing namespace quota

> Clearing namespace quota on "/" corrupts FS image
> -
>
> Key: HDFS-1258
> URL: https://issues.apache.org/jira/browse/HDFS-1258
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 0.20.3, 0.20-append, 0.20.204.0, 0.21.0, 0.22.0
>
> Attachments: clear-quota-0.20.patch, clear-quota-0.21.patch, 
> clear-quota.patch, clear-quota.patch
>
>
> The HDFS root directory starts out with a default namespace quota of 
> Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota 
> /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, 
> and the NN will not come back up from a restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7133) Support clearing namespace quota on "/"

2014-09-22 Thread Guo Ruijing (JIRA)

Guo Ruijing created HDFS-7133:
-

 Summary: Support clearing namespace quota on "/"
 Key: HDFS-7133
 URL: https://issues.apache.org/jira/browse/HDFS-7133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Guo Ruijing


existing implementation:

1. support set namespace quota on "/"
2. doesn't support clear namespace quota on "/" due to HDFS-1258

expected implementation:

support clearing namespace quota on "/"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5689) FsDatasetImpl registers mbean using uninitialized DataNode UUID

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144263#comment-14144263
 ] 

Hadoop QA commented on HDFS-5689:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12619608/HDFS-5689.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8154//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8154//console

This message is automatically generated.

> FsDatasetImpl registers mbean using uninitialized DataNode UUID
> ---
>
> Key: HDFS-5689
> URL: https://issues.apache.org/jira/browse/HDFS-5689
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5689.patch
>
>
> FsDatasetImpl's constructor attempts to include the datanode UUID in its 
> mbean's ObjectName:
>   registerMBean(datanode.getDatanodeUuid());
> Unfortunately this doesn't work because the provided DataNode's UUID isn't 
> set until bpRegistrationSucceeded() is called... after the FsDatasetImpl has 
> been created.  The result is the mbean is always registered with a bogus 
> (though valid) ObjectName:
>   Hadoop:name=FSDatasetState-null,service=DataNode
> Prior to HDFS-2832 and the storageID -> datanodeUuid rename, this was 
> initialized using the DataStorage:
>   registerMBean(storage.getStorageID());
> With the fix for HDFS-5454 in place, doing equivalent thing (already done by 
> SimulatedFSDataset):
>   registerMBean(storage.getDatanodeUuid());
> ...fixes the problem:
>   
> Hadoop:name=FSDatasetState-24aed86a-fee6-4b88-868e-285e09ea2766,service=DataNode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6581:

Attachment: HDFS-6581.merge.10.patch

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, 
> HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6581:

Attachment: HDFS-6581.merge.10.patch

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6581:

Attachment: (was: HDFS-6581.merge.10.patch)

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6988:

Attachment: HDFS-6988.02.patch

Thanks Xiaoyu, trivial update to patch.

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-6581
>
> Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-09-22 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144252#comment-14144252
 ] 

Xiaoyu Yao commented on HDFS-6988:
--

Looks good to me.
+1 (Non-binding)

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-6581
>
> Attachments: HDFS-6988.01.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7123) Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint

2014-09-22 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144253#comment-14144253
 ] 

Ming Ma commented on HDFS-7123:
---

Alternative approach is to release the lock between PB fsimage checkpoint and 
legacy fsimage checkpoint so that edit log replay can catch up. The two formats 
will have different contents; but that might be acceptable for certain 
scenarios. We can make it configurable w.r.t. whether to release lock between 
PB fsimage checkpoint and legacy fsimage checkpoint.

> Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint
> 
>
> Key: HDFS-7123
> URL: https://issues.apache.org/jira/browse/HDFS-7123
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7123.patch
>
>
> HDFS-7097 will address the checkpoint and BR issue. In addition, it might 
> still be useful to reduce the overall checkpoint duration, given it blocks 
> edit log replay. If there is large volume of edit log to catch up and NN fail 
> overs, it will impact the availability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144249#comment-14144249
 ] 

Hadoop QA commented on HDFS-7055:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670363/HDFS-7055.002.patch
  against trunk revision 43efdd3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1264 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.balancer.TestBalancer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8150//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8150//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8150//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8150//console

This message is automatically generated.

> Add tracing to DFSInputStream
> -
>
> Key: HDFS-7055
> URL: https://issues.apache.org/jira/browse/HDFS-7055
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: 2.6.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7055.002.patch
>
>
> Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7131) During HA upgrade, JournalNode should create a new committedTxnId file in the current directory

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144250#comment-14144250
 ] 

Hadoop QA commented on HDFS-7131:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670573/HDFS-7131.000.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.fs.TestUrlStreamHandler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8157//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8157//console

This message is automatically generated.

> During HA upgrade, JournalNode should create a new committedTxnId file in the 
> current directory
> ---
>
> Key: HDFS-7131
> URL: https://issues.apache.org/jira/browse/HDFS-7131
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-7131.000.patch
>
>
> Currently while doing HA upgrade, we do not create a new committedTxnId file  
>   in the new current directory of JournalNode. And before we have the fix in 
> HDFS-7042, since the file channel is never closed, for any new journal we're 
> actually updating the committedTxnId file in the previous directory. This can 
> cause NN to fail to start while rollback.
> HDFS-7042 fixes the main part of the issue: the file channel inside of the 
> committedTxnId object gets closed thus later a new file can be created in the 
> current directory. But maybe it is still better to copy the content file 
> during the upgrade so that we can always use it for sanity check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7036) HDFS-6776 fix requires to upgrade insecure cluster, which means quite some user pain

2014-09-22 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144213#comment-14144213
 ] 

Yongjun Zhang commented on HDFS-7036:
-

Hi [~wheat9] and [~jingzhao], 

It has been quite a while since I created this jira as a follow-up of 
HDFS-6776, as we agreed in the discussion there. Would you please comment here?

Thanks a lot.


> HDFS-6776 fix requires to upgrade insecure cluster, which means quite some 
> user pain
> 
>
> Key: HDFS-7036
> URL: https://issues.apache.org/jira/browse/HDFS-7036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.5.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-7036.001.patch
>
>
> Issuing command
> {code}
>  hadoop fs -lsr webhdfs://
> {code}
> at a secure cluster side fails with message "Failed to get the token ...", 
> similar symptom as reported in HDFS-6776.
> If the fix of HDFS-6776 is applied to only the secure cluster, doing 
> {code}
> distcp webhdfs:// 
> {code}
> would fail same way.
> Basically running any application in secure cluster to access insecure 
> cluster via webhdfs would fail the same way, if the HDFS-6776 fix is not 
> applied to the insecure cluster.
> This could be quite some user pain. Filing this jira for a solution to make 
> user's life easier.
> One proposed solution was to add a msg-parsing mechanism in webhdfs, which is 
> a bit hacky. The other proposed solution is to do the same kind of hack at 
> application side, which means the same hack need to be applied in each 
> application.
> Thanks [~daryn], [~wheat9], [~jingzhao], [~tucu00] and [~atm] for the 
> discussion in HDFS-6776.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6988:

Attachment: HDFS-6988.01.patch

Add upper bound for percentage.

Slight refactoring for test case.

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-6581
>
> Attachments: HDFS-6988.01.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-22 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7132:
---
Status: Patch Available  (was: Open)

> hdfs namenode -metadataVersion command does not honor configured name dirs
> --
>
> Key: HDFS-7132
> URL: https://issues.apache.org/jira/browse/HDFS-7132
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7132.001.patch
>
>
> The hdfs namenode -metadataVersion command does not honor 
> dfs.namenode.name.dir.. configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-22 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7132:
---
Attachment: HDFS-7132.001.patch

The -metadataVersion subcommand does not actually create a NameNode instance. 
This means that suffixed config params like DFS_NAMENODE_NAME_DIR_KEY do not 
get initialized.

This patch causes the -metadataVersion to call NameNode.initializeGenericKeys. 
It also modifies TestMetadataVersion to cover this case.

> hdfs namenode -metadataVersion command does not honor configured name dirs
> --
>
> Key: HDFS-7132
> URL: https://issues.apache.org/jira/browse/HDFS-7132
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7132.001.patch
>
>
> The hdfs namenode -metadataVersion command does not honor 
> dfs.namenode.name.dir.. configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-22 Thread Charles Lamb (JIRA)

Charles Lamb created HDFS-7132:
--

 Summary: hdfs namenode -metadataVersion command does not honor 
configured name dirs
 Key: HDFS-7132
 URL: https://issues.apache.org/jira/browse/HDFS-7132
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


The hdfs namenode -metadataVersion command does not honor 
dfs.namenode.name.dir.. configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7131) During HA upgrade, JournalNode should create a new committedTxnId file in the current directory

2014-09-22 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7131:

Attachment: HDFS-7131.000.patch

> During HA upgrade, JournalNode should create a new committedTxnId file in the 
> current directory
> ---
>
> Key: HDFS-7131
> URL: https://issues.apache.org/jira/browse/HDFS-7131
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-7131.000.patch
>
>
> Currently while doing HA upgrade, we do not create a new committedTxnId file  
>   in the new current directory of JournalNode. And before we have the fix in 
> HDFS-7042, since the file channel is never closed, for any new journal we're 
> actually updating the committedTxnId file in the previous directory. This can 
> cause NN to fail to start while rollback.
> HDFS-7042 fixes the main part of the issue: the file channel inside of the 
> committedTxnId object gets closed thus later a new file can be created in the 
> current directory. But maybe it is still better to copy the content file 
> during the upgrade so that we can always use it for sanity check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7131) During HA upgrade, JournalNode should create a new committedTxnId file in the current directory

2014-09-22 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7131:

Status: Patch Available  (was: Open)

> During HA upgrade, JournalNode should create a new committedTxnId file in the 
> current directory
> ---
>
> Key: HDFS-7131
> URL: https://issues.apache.org/jira/browse/HDFS-7131
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-7131.000.patch
>
>
> Currently while doing HA upgrade, we do not create a new committedTxnId file  
>   in the new current directory of JournalNode. And before we have the fix in 
> HDFS-7042, since the file channel is never closed, for any new journal we're 
> actually updating the committedTxnId file in the previous directory. This can 
> cause NN to fail to start while rollback.
> HDFS-7042 fixes the main part of the issue: the file channel inside of the 
> committedTxnId object gets closed thus later a new file can be created in the 
> current directory. But maybe it is still better to copy the content file 
> during the upgrade so that we can always use it for sanity check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7131) During HA upgrade, JournalNode should also copy the committedTxnId file into the current directory

2014-09-22 Thread Jing Zhao (JIRA)

Jing Zhao created HDFS-7131:
---

 Summary: During HA upgrade, JournalNode should also copy the 
committedTxnId file into the current directory
 Key: HDFS-7131
 URL: https://issues.apache.org/jira/browse/HDFS-7131
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jing Zhao
Assignee: Jing Zhao


Currently while doing HA upgrade, we do not create a new committedTxnId file
in the new current directory of JournalNode. And before we have the fix in 
HDFS-7042, since the file channel is never closed, for any new journal we're 
actually updating the committedTxnId file in the previous directory. This can 
cause NN to fail to start while rollback.

HDFS-7042 fixes the main part of the issue: the file channel inside of the 
committedTxnId object gets closed thus later a new file can be created in the 
current directory. But maybe it is still better to copy the content file during 
the upgrade so that we can always use it for sanity check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7131) During HA upgrade, JournalNode should create a new committedTxnId file in the current directory

2014-09-22 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7131:

Summary: During HA upgrade, JournalNode should create a new committedTxnId 
file in the current directory  (was: During HA upgrade, JournalNode should also 
copy the committedTxnId file into the current directory)

> During HA upgrade, JournalNode should create a new committedTxnId file in the 
> current directory
> ---
>
> Key: HDFS-7131
> URL: https://issues.apache.org/jira/browse/HDFS-7131
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> Currently while doing HA upgrade, we do not create a new committedTxnId file  
>   in the new current directory of JournalNode. And before we have the fix in 
> HDFS-7042, since the file channel is never closed, for any new journal we're 
> actually updating the committedTxnId file in the previous directory. This can 
> cause NN to fail to start while rollback.
> HDFS-7042 fixes the main part of the issue: the file channel inside of the 
> committedTxnId object gets closed thus later a new file can be created in the 
> current directory. But maybe it is still better to copy the content file 
> during the upgrade so that we can always use it for sanity check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6881) The DFSClient should use the sampler to determine whether to initiate trace spans when making RPCv9 calls to the NN and DN

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6881:
---
Attachment: HDFS-6881.003.patch

This is a bit of a simpler approach using the Configuration object that is 
passed in to the Invoker to determine which Sampler to use.

> The DFSClient should use the sampler to determine whether to initiate trace 
> spans when making RPCv9 calls to the NN and DN
> --
>
> Key: HDFS-6881
> URL: https://issues.apache.org/jira/browse/HDFS-6881
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Masatake Iwasaki
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6881.002.patch, HDFS-6881.003.patch
>
>
> The DFSClient should use the configred HTrace sampler to determine whether to 
> initiate trace spans when making RPCv9 calls to the NN and DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7011) Implement basic utilities for libhdfs3

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-7011:
--

Assignee: Colin Patrick McCabe

> Implement basic utilities for libhdfs3
> --
>
> Key: HDFS-7011
> URL: https://issues.apache.org/jira/browse/HDFS-7011
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7011-pnative.001.patch, HDFS-7011.patch
>
>
> Implement basic utilities such as hash, exception handling, logger, configure 
> parser, checksum calculate and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7011) Implement basic utilities for libhdfs3

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7011:
---
Attachment: HDFS-7011-pnative.001.patch

> Implement basic utilities for libhdfs3
> --
>
> Key: HDFS-7011
> URL: https://issues.apache.org/jira/browse/HDFS-7011
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
> Attachments: HDFS-7011-pnative.001.patch, HDFS-7011.patch
>
>
> Implement basic utilities such as hash, exception handling, logger, configure 
> parser, checksum calculate and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6988:

Summary: Add configurable limit for percentage-based eviction threshold  
(was: Make RAM disk eviction thresholds configurable)

> Add configurable limit for percentage-based eviction threshold
> --
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-6581
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Fix For: HDFS-6581
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
> thresholds configurable. The hard-coded thresholds may not be appropriate for 
> very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-6990) Add unit test for evict/delete RAM_DISK block with open handle

2014-09-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-6990.
-
   Resolution: Fixed
Fix Version/s: HDFS-6581
 Hadoop Flags: Reviewed

Committed to the feature branch. Thanks for the contribution [~xyao]!

> Add unit test for evict/delete RAM_DISK block with open handle
> --
>
> Key: HDFS-6990
> URL: https://issues.apache.org/jira/browse/HDFS-6990
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Fix For: HDFS-6581
>
> Attachments: HDFS-6990.0.patch, HDFS-6990.1.patch, HDFS-6990.2.patch, 
> HDFS-6990.3.patch
>
>
> This is to verify:
> * Evict RAM_DISK block with open handle should fall back to DISK.
> * Delete RAM_DISK block (persisted) with open handle should mark the block to 
> be deleted upon handle close. 
> Simply open handle to file in DFS name space won't work as expected. We need 
> a local FS file handle to the block file. The only meaningful case is for 
> Short Circuit Read. This JIRA is to validate/enable the two cases with SCR 
> enabled MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7122) Very poor distribution of replication copies

2014-09-22 Thread Jeff Buell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144149#comment-14144149
 ] 

Jeff Buell commented on HDFS-7122:
--

One physical rack, but I'm using HVE to create multiple "virtual racks", that 
is arbitrarily assigning groups of hosts to different racks.  I've tried 
splitting the 32 hosts into 2 racks and 16 racks.  I'm using the latter now 
since it gave me a more uniform distribution, but for testing we should use the 
minimum number of racks to enhance the non-uniformity.

> Very poor distribution of replication copies
> 
>
> Key: HDFS-7122
> URL: https://issues.apache.org/jira/browse/HDFS-7122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
> Environment: medium-large environments with 100's to 1000's of DNs 
> will be most affected, but potentially all environments.
>Reporter: Jeff Buell
>Assignee: Andrew Wang
>Priority: Critical
>  Labels: performance
>
> Summary:
> Since HDFS-6268, the distribution of replica block copies across the 
> DataNodes (replicas 2,3,... as distinguished from the first "primary" 
> replica) is extremely poor, to the point that TeraGen slows down by as much 
> as 3X for certain configurations.  This is almost certainly due to the 
> introduction of Thread Local Random in HDFS-6268.  The mechanism appears to 
> be that this change causes all the random numbers in the threads to be 
> correlated, thus preventing a truly random choice of DN for each replica copy.
> Testing details:
> 1 TB TeraGen on 638 slave nodes (virtual machines on 32 physical hosts), 
> 256MB block size.  This results in 6 "primary" blocks on each DN.  With 
> replication=3, there will be on average 12 more copies on each DN that are 
> copies of blocks from other DNs.  Because of the random selection of DNs, 
> exactly 12 copies are not expected, but I found that about 160 DNs (1/4 of 
> all DNs!) received absolutely no copies, while one DN received over 100 
> copies, and the elapsed time increased by about 3X from a pre-HDFS-6268 
> distro.  There was no pattern to which DNs didn't receive copies, nor was the 
> set of such DNs repeatable run-to-run. In addition to the performance 
> problem, there could be capacity problems due to one or a few DNs running out 
> of space. Testing was done on CDH 5.0.0 (before) and CDH 5.1.2 (after), but I 
> don't see a significant difference from the Apache Hadoop source in this 
> regard. The workaround to recover the previous behavior is to set 
> dfs.namenode.handler.count=1 but of course this has scaling implications for 
> large clusters.
> I recommend that the ThreadLocal Random part of HDFS-6268 be reverted until a 
> better algorithm can be implemented and tested.  Testing should include a 
> case with many DNs and a small number of blocks on each.
> It should also be noted that even pre-HDFS-6268, the random choice of DN 
> algorithm produces a rather non-uniform distribution of copies.  This is not 
> due to any bug, but purely a case of random distributions being much less 
> uniform than one might intuitively expect. In the above case, pre-HDFS-6268 
> yields something like a range of 3 to 25 block copies on each DN. 
> Surprisingly, the performance penalty of this non-uniformity is not as big as 
> might be expected (maybe only 10-20%), but HDFS should do better, and in any 
> case the capacity issue remains.  Round-robin choice of DN?  Better awareness 
> of which DNs currently store fewer blocks? It's not sufficient that the total 
> number of blocks is similar on each DN at the end, but that at each point in 
> time no individual DN receives a disproportionate number of blocks at once 
> (which could be a danger of a RR algorithm).
> Probably should limit this jira to tracking the ThreadLocal issue, and track 
> the random choice issue in another one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144147#comment-14144147
 ] 

Hadoop QA commented on HDFS-5782:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12623235/HDFS-5782.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8155//console

This message is automatically generated.

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7122) Very poor distribution of replication copies

2014-09-22 Thread Jeff Buell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144141#comment-14144141
 ] 

Jeff Buell commented on HDFS-7122:
--

I mainly use WithNodeGroup, but I did one test without it for the same reason 
you're thinking of, by setting the topology to host=rack (places the 2nd and 
3rd replicas in different VMs on one host).  This didn't make any difference.  
What does make a difference is the number of slave nodes.  It might be hard to 
see the skew if you have only a small number of DNs. Virtualization makes 
scale-out tests much easier! (Containers could also be used.)
7u55
handler count was 20.
If you can build me a tarball with the MR1 stuff in it, I'll test it.  I should 
have mentioned I'm using MR1, could that make a difference?

> Very poor distribution of replication copies
> 
>
> Key: HDFS-7122
> URL: https://issues.apache.org/jira/browse/HDFS-7122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
> Environment: medium-large environments with 100's to 1000's of DNs 
> will be most affected, but potentially all environments.
>Reporter: Jeff Buell
>Assignee: Andrew Wang
>Priority: Critical
>  Labels: performance
>
> Summary:
> Since HDFS-6268, the distribution of replica block copies across the 
> DataNodes (replicas 2,3,... as distinguished from the first "primary" 
> replica) is extremely poor, to the point that TeraGen slows down by as much 
> as 3X for certain configurations.  This is almost certainly due to the 
> introduction of Thread Local Random in HDFS-6268.  The mechanism appears to 
> be that this change causes all the random numbers in the threads to be 
> correlated, thus preventing a truly random choice of DN for each replica copy.
> Testing details:
> 1 TB TeraGen on 638 slave nodes (virtual machines on 32 physical hosts), 
> 256MB block size.  This results in 6 "primary" blocks on each DN.  With 
> replication=3, there will be on average 12 more copies on each DN that are 
> copies of blocks from other DNs.  Because of the random selection of DNs, 
> exactly 12 copies are not expected, but I found that about 160 DNs (1/4 of 
> all DNs!) received absolutely no copies, while one DN received over 100 
> copies, and the elapsed time increased by about 3X from a pre-HDFS-6268 
> distro.  There was no pattern to which DNs didn't receive copies, nor was the 
> set of such DNs repeatable run-to-run. In addition to the performance 
> problem, there could be capacity problems due to one or a few DNs running out 
> of space. Testing was done on CDH 5.0.0 (before) and CDH 5.1.2 (after), but I 
> don't see a significant difference from the Apache Hadoop source in this 
> regard. The workaround to recover the previous behavior is to set 
> dfs.namenode.handler.count=1 but of course this has scaling implications for 
> large clusters.
> I recommend that the ThreadLocal Random part of HDFS-6268 be reverted until a 
> better algorithm can be implemented and tested.  Testing should include a 
> case with many DNs and a small number of blocks on each.
> It should also be noted that even pre-HDFS-6268, the random choice of DN 
> algorithm produces a rather non-uniform distribution of copies.  This is not 
> due to any bug, but purely a case of random distributions being much less 
> uniform than one might intuitively expect. In the above case, pre-HDFS-6268 
> yields something like a range of 3 to 25 block copies on each DN. 
> Surprisingly, the performance penalty of this non-uniformity is not as big as 
> might be expected (maybe only 10-20%), but HDFS should do better, and in any 
> case the capacity issue remains.  Round-robin choice of DN?  Better awareness 
> of which DNs currently store fewer blocks? It's not sufficient that the total 
> number of blocks is similar on each DN at the end, but that at each point in 
> time no individual DN receives a disproportionate number of blocks at once 
> (which could be a danger of a RR algorithm).
> Probably should limit this jira to tracking the ThreadLocal issue, and track 
> the random choice issue in another one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6990) Add unit test for evict/delete RAM_DISK block with open handle

2014-09-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144136#comment-14144136
 ] 

Arpit Agarwal commented on HDFS-6990:
-

FTR, here is the offline feedback I had on this patch:

{code}
+  // Ensure path1 is still readable from the open SCR handle.
+  fis.read(fis.getPos(), buf, 0, BUFFER_LENGTH);
+  assertThat(verifyReadRandomFile(path1, BLOCK_SIZE, SEED), is(true));
{code}

This is not reading from the same handle as before. We should read from the fis 
handle we opened earlier, and then also check the short circuit read counters 
to ensure the read was short-circuited.

These tests should be skipped on Windows or when native IO is not enabled. You 
could add something like:
{code}
assumeTrue(NativeCodeLoader.isNativeCodeLoaded() && !Path.WINDOWS);
{code}

See Chris's recent fix to HDFS-7110.

Also it is worth noting (maybe in a comment) we are not really testing SCR from 
RAM_DISK, since the test fakes RAM_DISK on physical disk.

+1 with these updates.

> Add unit test for evict/delete RAM_DISK block with open handle
> --
>
> Key: HDFS-6990
> URL: https://issues.apache.org/jira/browse/HDFS-6990
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-6990.0.patch, HDFS-6990.1.patch, HDFS-6990.2.patch, 
> HDFS-6990.3.patch
>
>
> This is to verify:
> * Evict RAM_DISK block with open handle should fall back to DISK.
> * Delete RAM_DISK block (persisted) with open handle should mark the block to 
> be deleted upon handle close. 
> Simply open handle to file in DFS name space won't work as expected. We need 
> a local FS file handle to the block file. The only meaningful case is for 
> Short Circuit Read. This JIRA is to validate/enable the two cases with SCR 
> enabled MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144133#comment-14144133
 ] 

Hadoop QA commented on HDFS-5631:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625158/HDFS-5631.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8153//console

This message is automatically generated.

> Expose interfaces required by FsDatasetSpi implementations
> --
>
> Key: HDFS-5631
> URL: https://issues.apache.org/jira/browse/HDFS-5631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5631.patch, HDFS-5631.patch
>
>
> This sub-task addresses section 4.1 of the document attached to HDFS-5194,
> the exposure of interfaces needed by a FsDatasetSpi implementation.
> Specifically it makes ChunkChecksum public and BlockMetadataHeader's
> readHeader() and writeHeader() methods public.
> The changes to BlockReaderUtil (and related classes) discussed by section
> 4.1 are only needed if supporting short-circuit, and should be addressed
> as part of an effort to provide such support rather than this JIRA.
> To help ensure these changes are complete and are not regressed in the
> future, tests that gauge the accessibility (though *not* behavior)
> of interfaces needed by a FsDatasetSpi subclass are also included.
> These take the form of a dummy FsDatasetSpi subclass -- a successful
> compilation is effectively a pass.  Trivial unit tests are included so
> that there is something tangible to track.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6990) Add unit test for evict/delete RAM_DISK block with open handle

2014-09-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144132#comment-14144132
 ] 

Arpit Agarwal commented on HDFS-6990:
-

+1 for the patch. Thanks for addressing all the rounds of feedback [~xyao]. I 
will commit it shortly.

> Add unit test for evict/delete RAM_DISK block with open handle
> --
>
> Key: HDFS-6990
> URL: https://issues.apache.org/jira/browse/HDFS-6990
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-6990.0.patch, HDFS-6990.1.patch, HDFS-6990.2.patch, 
> HDFS-6990.3.patch
>
>
> This is to verify:
> * Evict RAM_DISK block with open handle should fall back to DISK.
> * Delete RAM_DISK block (persisted) with open handle should mark the block to 
> be deleted upon handle close. 
> Simply open handle to file in DFS name space won't work as expected. We need 
> a local FS file handle to the block file. The only meaningful case is for 
> Short Circuit Read. This JIRA is to validate/enable the two cases with SCR 
> enabled MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-7010.

Resolution: Fixed

Committed.  Thanks, Abe.

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7122) Very poor distribution of replication copies

2014-09-22 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144123#comment-14144123
 ] 

Andrew Wang commented on HDFS-7122:
---

Another q, how was your rack topology set up? I'd like to mirror as much as 
possible in my unit test to try for that repro.

> Very poor distribution of replication copies
> 
>
> Key: HDFS-7122
> URL: https://issues.apache.org/jira/browse/HDFS-7122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
> Environment: medium-large environments with 100's to 1000's of DNs 
> will be most affected, but potentially all environments.
>Reporter: Jeff Buell
>Assignee: Andrew Wang
>Priority: Critical
>  Labels: performance
>
> Summary:
> Since HDFS-6268, the distribution of replica block copies across the 
> DataNodes (replicas 2,3,... as distinguished from the first "primary" 
> replica) is extremely poor, to the point that TeraGen slows down by as much 
> as 3X for certain configurations.  This is almost certainly due to the 
> introduction of Thread Local Random in HDFS-6268.  The mechanism appears to 
> be that this change causes all the random numbers in the threads to be 
> correlated, thus preventing a truly random choice of DN for each replica copy.
> Testing details:
> 1 TB TeraGen on 638 slave nodes (virtual machines on 32 physical hosts), 
> 256MB block size.  This results in 6 "primary" blocks on each DN.  With 
> replication=3, there will be on average 12 more copies on each DN that are 
> copies of blocks from other DNs.  Because of the random selection of DNs, 
> exactly 12 copies are not expected, but I found that about 160 DNs (1/4 of 
> all DNs!) received absolutely no copies, while one DN received over 100 
> copies, and the elapsed time increased by about 3X from a pre-HDFS-6268 
> distro.  There was no pattern to which DNs didn't receive copies, nor was the 
> set of such DNs repeatable run-to-run. In addition to the performance 
> problem, there could be capacity problems due to one or a few DNs running out 
> of space. Testing was done on CDH 5.0.0 (before) and CDH 5.1.2 (after), but I 
> don't see a significant difference from the Apache Hadoop source in this 
> regard. The workaround to recover the previous behavior is to set 
> dfs.namenode.handler.count=1 but of course this has scaling implications for 
> large clusters.
> I recommend that the ThreadLocal Random part of HDFS-6268 be reverted until a 
> better algorithm can be implemented and tested.  Testing should include a 
> case with many DNs and a small number of blocks on each.
> It should also be noted that even pre-HDFS-6268, the random choice of DN 
> algorithm produces a rather non-uniform distribution of copies.  This is not 
> due to any bug, but purely a case of random distributions being much less 
> uniform than one might intuitively expect. In the above case, pre-HDFS-6268 
> yields something like a range of 3 to 25 block copies on each DN. 
> Surprisingly, the performance penalty of this non-uniformity is not as big as 
> might be expected (maybe only 10-20%), but HDFS should do better, and in any 
> case the capacity issue remains.  Round-robin choice of DN?  Better awareness 
> of which DNs currently store fewer blocks? It's not sufficient that the total 
> number of blocks is similar on each DN at the end, but that at each point in 
> time no individual DN receives a disproportionate number of blocks at once 
> (which could be a danger of a RR algorithm).
> Probably should limit this jira to tracking the ThreadLocal issue, and track 
> the random choice issue in another one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-5868) Make hsync implementation pluggable

2014-09-22 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-5868:


Assignee: Taylor, Buddy

> Make hsync implementation pluggable
> ---
>
> Key: HDFS-5868
> URL: https://issues.apache.org/jira/browse/HDFS-5868
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Taylor, Buddy
>Assignee: Taylor, Buddy
> Fix For: 2.4.0
>
> Attachments: HDFS-5868-branch-2.patch, HDFS-5868a-branch-2.patch, 
> HDFS-5868b-branch-2.patch
>
>
> The current implementation of hsync in BlockReceiver only works if the output 
> streams are instances of FileOutputStream. Therefore, there is currently no 
> way for a FSDatasetSpi plugin to implement hsync if it is not using standard 
> OS files.
> One possible solution is to push the implementation of hsync into the 
> ReplicaOutputStreams class. This class is constructed by the 
> ReplicaInPipeline which is constructed by the FSDatasetSpi plugin, therefore 
> it can be extended. Instead of directly calling sync on the output stream, 
> BlockReceiver would call ReplicaOutputStream.sync.  The default 
> implementation of sync in ReplicaOutputStream would be the same as the 
> current implementation in BlockReceiver. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes

2014-09-22 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-5782:


Assignee: David Powell

> BlockListAsLongs should take lists of Replicas rather than concrete classes
> ---
>
> Key: HDFS-5782
> URL: https://issues.apache.org/jira/browse/HDFS-5782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5782.patch
>
>
> From HDFS-5194:
> {quote}
> BlockListAsLongs's constructor takes a list of Blocks and a list of 
> ReplicaInfos.  On the surface, the former is mildly irritating because it is 
> a concrete class, while the latter is a greater concern due to being a 
> File-based implementation of Replica.
> On deeper inspection, BlockListAsLongs passes members of both to an internal 
> method that accepts just Blocks, which conditionally casts them *back* to 
> ReplicaInfos (this cast only happens to the latter, though this isn't 
> immediately obvious to the reader).
> Conveniently, all methods called on these objects are found in the Replica 
> interface, and all functional (i.e. non-test) consumers of this interface 
> pass in Replica subclasses.  If this constructor took Lists of Replicas 
> instead, it would be more generally useful and its implementation would be 
> cleaner as well.
> {quote}
> Fixing this indeed makes the business end of BlockListAsLongs cleaner while 
> requiring no changes to FsDatasetImpl.  As suggested by the above 
> description, though, the HDFS tests use BlockListAsLongs differently from the 
> production code -- they pretty much universally provide a list of actual 
> Blocks.  To handle this:
> - In the case of SimulatedFSDataset, providing a list of Replicas is actually 
> less work.
> - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly 
> invasive.  Instead, the patch creates a second constructor in 
> BlockListOfLongs specifically for the use of NNThrougputBenchmark.  It turns 
> the stomach a little, but is clearer and requires less code than the 
> alternatives (and isn't without precedent).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-5689) FsDatasetImpl registers mbean using uninitialized DataNode UUID

2014-09-22 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-5689:


Assignee: David Powell

> FsDatasetImpl registers mbean using uninitialized DataNode UUID
> ---
>
> Key: HDFS-5689
> URL: https://issues.apache.org/jira/browse/HDFS-5689
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5689.patch
>
>
> FsDatasetImpl's constructor attempts to include the datanode UUID in its 
> mbean's ObjectName:
>   registerMBean(datanode.getDatanodeUuid());
> Unfortunately this doesn't work because the provided DataNode's UUID isn't 
> set until bpRegistrationSucceeded() is called... after the FsDatasetImpl has 
> been created.  The result is the mbean is always registered with a bogus 
> (though valid) ObjectName:
>   Hadoop:name=FSDatasetState-null,service=DataNode
> Prior to HDFS-2832 and the storageID -> datanodeUuid rename, this was 
> initialized using the DataStorage:
>   registerMBean(storage.getStorageID());
> With the fix for HDFS-5454 in place, doing equivalent thing (already done by 
> SimulatedFSDataset):
>   registerMBean(storage.getDatanodeUuid());
> ...fixes the problem:
>   
> Hadoop:name=FSDatasetState-24aed86a-fee6-4b88-868e-285e09ea2766,service=DataNode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-5194) Robust support for alternate FsDatasetSpi implementations

2014-09-22 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-5194:


Assignee: David Powell

> Robust support for alternate FsDatasetSpi implementations
> -
>
> Key: HDFS-5194
> URL: https://issues.apache.org/jira/browse/HDFS-5194
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5194.design.01222014.pdf, 
> HDFS-5194.design.09112013.pdf, HDFS-5194.patch.09112013
>
>
> The existing FsDatasetSpi interface is well-positioned to permit extending 
> Hadoop to run natively on non-traditional storage architectures.  Before this 
> can be done, however, a number of gaps need to be addressed.  This JIRA 
> documents those gaps, suggests some solutions, and puts forth a sample 
> implementation of some of the key changes needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations

2014-09-22 Thread Aaron T. Myers (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-5631:


Assignee: David Powell

> Expose interfaces required by FsDatasetSpi implementations
> --
>
> Key: HDFS-5631
> URL: https://issues.apache.org/jira/browse/HDFS-5631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: David Powell
>Assignee: David Powell
>Priority: Minor
> Attachments: HDFS-5631.patch, HDFS-5631.patch
>
>
> This sub-task addresses section 4.1 of the document attached to HDFS-5194,
> the exposure of interfaces needed by a FsDatasetSpi implementation.
> Specifically it makes ChunkChecksum public and BlockMetadataHeader's
> readHeader() and writeHeader() methods public.
> The changes to BlockReaderUtil (and related classes) discussed by section
> 4.1 are only needed if supporting short-circuit, and should be addressed
> as part of an effort to provide such support rather than this JIRA.
> To help ensure these changes are complete and are not regressed in the
> future, tests that gauge the accessibility (though *not* behavior)
> of interfaces needed by a FsDatasetSpi subclass are also included.
> These take the form of a dummy FsDatasetSpi subclass -- a successful
> compilation is effectively a pass.  Trivial unit tests are included so
> that there is something tangible to track.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-22 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov reassigned HDFS-7128:
---

Assignee: Gera Shegalov

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Gera Shegalov
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-22 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-7128:

Assignee: (was: Gera Shegalov)

> Decommission slows way down when it gets towards the end
> 
>
> Key: HDFS-7128
> URL: https://issues.apache.org/jira/browse/HDFS-7128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> When we decommission nodes across different racks, the decommission process 
> becomes really slow at the end, hardly making any progress. The problem is 
> some blocks are on 3 decomm-in-progress DNs and the way how replications are 
> scheduled caused unnecessary delay. Here is the analysis.
> When BlockManager schedules the replication work from neededReplication, it 
> first needs to pick the source node for replication via chooseSourceDatanode. 
> The core policies to pick the source node are:
> 1. Prefer decomm-in-progress node.
> 2. Only pick the nodes whose outstanding replication counts are below 
> thresholds dfs.namenode.replication.max-streams or 
> dfs.namenode.replication.max-streams-hard-limit, based on the replication 
> priority.
> When we decommission nodes,
> 1. All the decommission nodes' blocks will be added to neededReplication.
> 2. BM will pick X number of blocks from neededReplication in each iteration. 
> X is based on cluster size and some configurable multiplier. So if the 
> cluster has 2000 nodes, X will be around 4000.
> 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
> being chosen as the source node of all these 4000 nodes. The reason the 
> outstanding replication thresholds don't kick is due to the implementation of 
> BlockManager.computeReplicationWorkForBlocks; 
> node.getNumberOfBlocksToBeReplicated() remains zero given 
> node.addBlockToBeReplicated is called after source node iteration.
> {noformat}
> ...
>   synchronized (neededReplications) {
> for (int priority = 0; priority < blocksToReplicate.size(); 
> priority++) {
> ...
> chooseSourceDatanode
> ...
> }
>   for(ReplicationWork rw : work){
> ...
>   rw.srcNode.addBlockToBeReplicated(block, targets);
> ...
>   }
> {noformat}
>  
> 4. So several decomm-in-progress nodes A, B, C end up with 4000 
> node.getNumberOfBlocksToBeReplicated().
> 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
> take 800 minutes to finish replication of these blocks.
> 6. Pending replication timeout kick in after 5 minutes. The items will be 
> removed from the pending replication queue and added back to 
> neededReplication. The replications will then be handled by other source 
> nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
> replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
> the replications of these blocks, although these blocks might have been 
> replicated by other DNs after replication timeout.
> 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
> replication queue. Even though the block's replication timeout, no source 
> node can be chosen given A, B, C all have high pending replication count. So 
> we have to wait until A drains its pending replication queue. Meanwhile, the 
> items in A's pending replication queue have been taken care of by other nodes 
> and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7122) Very poor distribution of replication copies

2014-09-22 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144103#comment-14144103
 ] 

Andrew Wang commented on HDFS-7122:
---

I poked around with a unit test and was unable to reproduce skew that was quite 
this extreme. I'll admit that the workaround of only using one handler 
definitely does point at the thread-local Random being the issue, but since I'm 
unable to repro, it's going to be hard to test a fix.

Jeff, a couple questions for you in the meantime:

- Were you using the default block placement policy, or WithNodeGroup? I 
noticed you said you were using VMs.
- What JDK version were you using? I looked in JDK7u40's source, and it looks 
like the bare Random() constructor generates a unique seed in a thread-safe 
manner.
- What was your handler count before you changed it to 1?
- Is it possible you can test WIP work on your setup if I'm still unable to 
repro?

> Very poor distribution of replication copies
> 
>
> Key: HDFS-7122
> URL: https://issues.apache.org/jira/browse/HDFS-7122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
> Environment: medium-large environments with 100's to 1000's of DNs 
> will be most affected, but potentially all environments.
>Reporter: Jeff Buell
>Assignee: Andrew Wang
>Priority: Critical
>  Labels: performance
>
> Summary:
> Since HDFS-6268, the distribution of replica block copies across the 
> DataNodes (replicas 2,3,... as distinguished from the first "primary" 
> replica) is extremely poor, to the point that TeraGen slows down by as much 
> as 3X for certain configurations.  This is almost certainly due to the 
> introduction of Thread Local Random in HDFS-6268.  The mechanism appears to 
> be that this change causes all the random numbers in the threads to be 
> correlated, thus preventing a truly random choice of DN for each replica copy.
> Testing details:
> 1 TB TeraGen on 638 slave nodes (virtual machines on 32 physical hosts), 
> 256MB block size.  This results in 6 "primary" blocks on each DN.  With 
> replication=3, there will be on average 12 more copies on each DN that are 
> copies of blocks from other DNs.  Because of the random selection of DNs, 
> exactly 12 copies are not expected, but I found that about 160 DNs (1/4 of 
> all DNs!) received absolutely no copies, while one DN received over 100 
> copies, and the elapsed time increased by about 3X from a pre-HDFS-6268 
> distro.  There was no pattern to which DNs didn't receive copies, nor was the 
> set of such DNs repeatable run-to-run. In addition to the performance 
> problem, there could be capacity problems due to one or a few DNs running out 
> of space. Testing was done on CDH 5.0.0 (before) and CDH 5.1.2 (after), but I 
> don't see a significant difference from the Apache Hadoop source in this 
> regard. The workaround to recover the previous behavior is to set 
> dfs.namenode.handler.count=1 but of course this has scaling implications for 
> large clusters.
> I recommend that the ThreadLocal Random part of HDFS-6268 be reverted until a 
> better algorithm can be implemented and tested.  Testing should include a 
> case with many DNs and a small number of blocks on each.
> It should also be noted that even pre-HDFS-6268, the random choice of DN 
> algorithm produces a rather non-uniform distribution of copies.  This is not 
> due to any bug, but purely a case of random distributions being much less 
> uniform than one might intuitively expect. In the above case, pre-HDFS-6268 
> yields something like a range of 3 to 25 block copies on each DN. 
> Surprisingly, the performance penalty of this non-uniformity is not as big as 
> might be expected (maybe only 10-20%), but HDFS should do better, and in any 
> case the capacity issue remains.  Round-robin choice of DN?  Better awareness 
> of which DNs currently store fewer blocks? It's not sufficient that the total 
> number of blocks is similar on each DN at the end, but that at each point in 
> time no individual DN receives a disproportionate number of blocks at once 
> (which could be a danger of a RR algorithm).
> Probably should limit this jira to tracking the ThreadLocal issue, and track 
> the random choice issue in another one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7001) Tests in TestTracing should not depend on the order of execution

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7001:
---
   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

> Tests in TestTracing should not depend on the order of execution
> 
>
> Key: HDFS-7001
> URL: https://issues.apache.org/jira/browse/HDFS-7001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: HDFS-7001-0.patch, HDFS-7001-1.patch
>
>
> o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
> first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7001) Tests in TestTracing should not depend on the order of execution

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7001:
---
Summary: Tests in TestTracing should not depend on the order of execution  
(was: Tests in TestTracing depends on the order of execution)

> Tests in TestTracing should not depend on the order of execution
> 
>
> Key: HDFS-7001
> URL: https://issues.apache.org/jira/browse/HDFS-7001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7001-0.patch, HDFS-7001-1.patch
>
>
> o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
> first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7001) Tests in TestTracing depends on the order of execution

2014-09-22 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144098#comment-14144098
 ] 

Colin Patrick McCabe commented on HDFS-7001:


+1.  Thanks, [~iwasakims].

> Tests in TestTracing depends on the order of execution
> --
>
> Key: HDFS-7001
> URL: https://issues.apache.org/jira/browse/HDFS-7001
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-7001-0.patch, HDFS-7001-1.patch
>
>
> o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
> first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6881) The DFSClient should use the sampler to determine whether to initiate trace spans when making RPCv9 calls to the NN and DN

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6881:
---
Status: Patch Available  (was: In Progress)

> The DFSClient should use the sampler to determine whether to initiate trace 
> spans when making RPCv9 calls to the NN and DN
> --
>
> Key: HDFS-6881
> URL: https://issues.apache.org/jira/browse/HDFS-6881
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.6.0
>Reporter: Masatake Iwasaki
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6881.002.patch
>
>
> The DFSClient should use the configred HTrace sampler to determine whether to 
> initiate trace spans when making RPCv9 calls to the NN and DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6956) Allow dynamically changing the tracing level in Hadoop servers

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6956:
---
Attachment: HDFS-6956.004.patch

* Add findbugs suppression for protobuf-generated java files.

* sort trace in the case statement

* sort output of traceadmin \-help

> Allow dynamically changing the tracing level in Hadoop servers
> --
>
> Key: HDFS-6956
> URL: https://issues.apache.org/jira/browse/HDFS-6956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6956.002.patch, HDFS-6956.003.patch, 
> HDFS-6956.004.patch
>
>
> We should allow users to dynamically change the tracing level in Hadoop 
> servers.  The easiest way to do this is probably to have an RPC accessible 
> only to the superuser that changes tracing settings.  This would allow us to 
> turn on and off tracing on the NameNode, DataNode, etc. at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Abraham Elmahrek (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144088#comment-14144088
 ] 

Abraham Elmahrek commented on HDFS-7010:


Works for me now! +1 from me!

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7010:
---
Attachment: HDFS-7010-pnative.004.patch

added

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144068#comment-14144068
 ] 

Hadoop QA commented on HDFS-7130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670544/HDFS-7130.1.patch
  against trunk revision 43efdd3.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8149//console

This message is automatically generated.

> TestDataTransferKeepalive fails intermittently on Windows.
> --
>
> Key: HDFS-7130
> URL: https://issues.apache.org/jira/browse/HDFS-7130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7130.1.patch
>
>
> {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
> tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
> likely too short on Windows, which has been observed to have a less granular 
> clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7124) Remove EncryptionZoneManager.NULL_EZ

2014-09-22 Thread Charles Lamb (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144066#comment-14144066
 ] 

Charles Lamb commented on HDFS-7124:


I did not add any new tests because TestEncryptionZones already tests this.

TestPipelinesFailover and TestEncryptionZonesWithKMS both pass on my local 
machine with the patch applied.

TestWebHdfsFileSystemContract fails with and without the patch on my local 
machine.


> Remove EncryptionZoneManager.NULL_EZ
> 
>
> Key: HDFS-7124
> URL: https://issues.apache.org/jira/browse/HDFS-7124
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7124.001.patch
>
>
> Remove EncryptionZoneManager.NULL_EZ so that null can be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Abraham Elmahrek (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144064#comment-14144064
 ] 

Abraham Elmahrek commented on HDFS-7010:


{code}PROJECT(libhdfs3 C CXX){code}
didn't do much. Could you try add the following after all the subdirectories 
are included:
{code}
SET_TARGET_PROPERTIES(libhdfs3-static PROPERTIES LINKER_LANGUAGE CXX)
SET_TARGET_PROPERTIES(libhdfs3-shared PROPERTIES LINKER_LANGUAGE CXX)
{code}

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144060#comment-14144060
 ] 

Hadoop QA commented on HDFS-7126:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670491/HDFS-7126.0.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8148//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8148//console

This message is automatically generated.

> TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path
> 
>
> Key: HDFS-7126
> URL: https://issues.apache.org/jira/browse/HDFS-7126
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7126.0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7130:

Status: Patch Available  (was: Open)

> TestDataTransferKeepalive fails intermittently on Windows.
> --
>
> Key: HDFS-7130
> URL: https://issues.apache.org/jira/browse/HDFS-7130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7130.1.patch
>
>
> {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
> tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
> likely too short on Windows, which has been observed to have a less granular 
> clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7130:

Attachment: HDFS-7130.1.patch

The attached patch increases the relevant thread sleep calls.  This is passing 
consistently for me on Mac and Windows.

> TestDataTransferKeepalive fails intermittently on Windows.
> --
>
> Key: HDFS-7130
> URL: https://issues.apache.org/jira/browse/HDFS-7130
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7130.1.patch
>
>
> {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
> tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
> likely too short on Windows, which has been observed to have a less granular 
> clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.

2014-09-22 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-7130:
---

 Summary: TestDataTransferKeepalive fails intermittently on Windows.
 Key: HDFS-7130
 URL: https://issues.apache.org/jira/browse/HDFS-7130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
likely too short on Windows, which has been observed to have a less granular 
clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7124) Remove EncryptionZoneManager.NULL_EZ

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144044#comment-14144044
 ] 

Hadoop QA commented on HDFS-7124:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670487/HDFS-7124.001.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8146//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8146//console

This message is automatically generated.

> Remove EncryptionZoneManager.NULL_EZ
> 
>
> Key: HDFS-7124
> URL: https://issues.apache.org/jira/browse/HDFS-7124
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7124.001.patch
>
>
> Remove EncryptionZoneManager.NULL_EZ so that null can be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6956) Allow dynamically changing the tracing level in Hadoop servers

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144023#comment-14144023
 ] 

Hadoop QA commented on HDFS-6956:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670481/HDFS-6956.003.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8145//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8145//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8145//console

This message is automatically generated.

> Allow dynamically changing the tracing level in Hadoop servers
> --
>
> Key: HDFS-6956
> URL: https://issues.apache.org/jira/browse/HDFS-6956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6956.002.patch, HDFS-6956.003.patch
>
>
> We should allow users to dynamically change the tracing level in Hadoop 
> servers.  The easiest way to do this is probably to have an RPC accessible 
> only to the superuser that changes tracing settings.  This would allow us to 
> turn on and off tracing on the NameNode, DataNode, etc. at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6990) Add unit test for evict/delete RAM_DISK block with open handle

2014-09-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6990:
-
Attachment: HDFS-6990.3.patch

Thanks [~arpitagarwal] for reviewing the test. I've udpated patch to skip these 
tests on Windows or native IO is not enabled. 


> Add unit test for evict/delete RAM_DISK block with open handle
> --
>
> Key: HDFS-6990
> URL: https://issues.apache.org/jira/browse/HDFS-6990
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-6990.0.patch, HDFS-6990.1.patch, HDFS-6990.2.patch, 
> HDFS-6990.3.patch
>
>
> This is to verify:
> * Evict RAM_DISK block with open handle should fall back to DISK.
> * Delete RAM_DISK block (persisted) with open handle should mark the block to 
> be deleted upon handle close. 
> Simply open handle to file in DFS name space won't work as expected. We need 
> a local FS file handle to the block file. The only meaningful case is for 
> Short Circuit Read. This JIRA is to validate/enable the two cases with SCR 
> enabled MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7010:
---
Attachment: HDFS-7010-pnative.004.patch

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144001#comment-14144001
 ] 

Colin Patrick McCabe commented on HDFS-7010:


Added license headers to cmake files, doxygen.in, platform.h.in.  Fixed 
platform.h.in to have a header guard.

bq. I'd also take a close look at doxygen inclusion in the future if this is 
something the Hadoop project wants in general. My sense of things is that the 
current configuration is more of a stub?

Hadoop doesn't currently build doxygen for our native projects, but we should.  
It's nice, basically JavaDoc for C/C++.

bq. \[Linker language comments\]

Hmm, that's odd.  Hopefully using this will fix it: {{PROJECT(libhdfs3 C CXX)}} 
 I will make that change...

bq. I'm not really aware of gmock or gtest... but it seems fine to me!

Yeah, they are BSD-licensed libraries being bundled.  gtest doesn't have a 
stable API (it's mostly implemented in a header file), so it's not really 
practical to link against the system version (and they explicitly tell you not 
to do this).  I'm not as sure about gmock, but it seems fine to bundle it for 
now.  It's only used for testing anyway.

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, 
> HDFS-7010-pnative.004.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7129) Metrics to track usage of memory for writes

2014-09-22 Thread Arpit Agarwal (JIRA)

Arpit Agarwal created HDFS-7129:
---

 Summary: Metrics to track usage of memory for writes
 Key: HDFS-7129
 URL: https://issues.apache.org/jira/browse/HDFS-7129
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal


A few metrics to evaluate feature usage and suggest improvements. Thanks to 
[~sureshms] for some of these suggestions.

# Number of times a block in memory was read (before being ejected)
# Average block size for data written to memory tier
# Time the block was in memory before being ejected
# Number of blocks written to memory
# Number of memory writes requested but not satisfied (failed-over to disk)
# Number of blocks evicted without ever being read from memory
# Average delay between memory write and disk write (window where a node 
restart could cause data loss).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7123) Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143984#comment-14143984
 ] 

Hadoop QA commented on HDFS-7123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670479/HDFS-7123.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
112 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/8143//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8143//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8143//console

This message is automatically generated.

> Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint
> 
>
> Key: HDFS-7123
> URL: https://issues.apache.org/jira/browse/HDFS-7123
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-7123.patch
>
>
> HDFS-7097 will address the checkpoint and BR issue. In addition, it might 
> still be useful to reduce the overall checkpoint duration, given it blocks 
> edit log replay. If there is large volume of edit log to catch up and NN fail 
> overs, it will impact the availability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7081) Add new DistributedFileSystem API for getting all the existing storage policies

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143983#comment-14143983
 ] 

Hadoop QA commented on HDFS-7081:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670480/HDFS-7081.003.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8144//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8144//console

This message is automatically generated.

> Add new DistributedFileSystem API for getting all the existing storage 
> policies
> ---
>
> Key: HDFS-7081
> URL: https://issues.apache.org/jira/browse/HDFS-7081
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch, 
> HDFS-7081.002.patch, HDFS-7081.003.patch
>
>
> Instead of loading all the policies from a client side configuration file, it 
> may be better to provide Mover with a new RPC call for getting all the 
> storage policies from the namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7010) boot up libhdfs3 project

2014-09-22 Thread Abraham Elmahrek (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143978#comment-14143978
 ] 

Abraham Elmahrek commented on HDFS-7010:


LGTM with the exception of a couple of the ASLv2 license headers missing from:
* doxygen.in
* platform.h.in
* pretty much all of the cmake files

I'd also take a close look at doxygen inclusion in the future if this is 
something the Hadoop project wants in general. My sense of things is that the 
current configuration is more of a stub?

Also, on Mac, I had some difficulty building it:
{code}
CMake Error: CMake can not determine linker language for target:libhdfs3-shared
CMake Error: Cannot determine link language for target "libhdfs3-shared".
CMake Error: Cannot determine link language for target "libhdfs3-static".
CMake Error: CMake can not determine linker language for target:libhdfs3-static
{code}
I did, however, simply remove a check to build from this directory. So that 
could be my fault. I worked around this by specifying 
http://www.cmake.org/cmake/help/v3.0/prop_tgt/LINKER_LANGUAGE.html.

I'm not really aware of gmock or gtest... but it seems fine to me!

Thanks for working on this Colin!

> boot up libhdfs3 project
> 
>
> Key: HDFS-7010
> URL: https://issues.apache.org/jira/browse/HDFS-7010
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Zhanwei Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-7010-pnative.003.patch, HDFS-7010.patch
>
>
> boot up libhdfs3 project with CMake, Readme and license file.
> Integrate google mock and google test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7127) TestLeaseRecovery leaks MiniDFSCluster instances.

2014-09-22 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143950#comment-14143950
 ] 

Chris Nauroth commented on HDFS-7127:
-

The test failures are unrelated.  {{TestUrlStreamHandler}} has turned up in a 
couple of recent test runs, but I haven't been able to repro the failure.

> TestLeaseRecovery leaks MiniDFSCluster instances.
> -
>
> Key: HDFS-7127
> URL: https://issues.apache.org/jira/browse/HDFS-7127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7127.1.patch
>
>
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} start a 
> {{MiniDFSCluster}} but never shuts it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6990) Add unit test for evict/delete RAM_DISK block with open handle

2014-09-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6990:
-
Attachment: HDFS-6990.2.patch

Update the patch with SCR reader counter check and the eviction validation. 

> Add unit test for evict/delete RAM_DISK block with open handle
> --
>
> Key: HDFS-6990
> URL: https://issues.apache.org/jira/browse/HDFS-6990
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-6990.0.patch, HDFS-6990.1.patch, HDFS-6990.2.patch
>
>
> This is to verify:
> * Evict RAM_DISK block with open handle should fall back to DISK.
> * Delete RAM_DISK block (persisted) with open handle should mark the block to 
> be deleted upon handle close. 
> Simply open handle to file in DFS name space won't work as expected. We need 
> a local FS file handle to the block file. The only meaningful case is for 
> Short Circuit Read. This JIRA is to validate/enable the two cases with SCR 
> enabled MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-22 Thread Ming Ma (JIRA)

Ming Ma created HDFS-7128:
-

 Summary: Decommission slows way down when it gets towards the end
 Key: HDFS-7128
 URL: https://issues.apache.org/jira/browse/HDFS-7128
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma


When we decommission nodes across different racks, the decommission process 
becomes really slow at the end, hardly making any progress. The problem is some 
blocks are on 3 decomm-in-progress DNs and the way how replications are 
scheduled caused unnecessary delay. Here is the analysis.

When BlockManager schedules the replication work from neededReplication, it 
first needs to pick the source node for replication via chooseSourceDatanode. 
The core policies to pick the source node are:

1. Prefer decomm-in-progress node.

2. Only pick the nodes whose outstanding replication counts are below 
thresholds dfs.namenode.replication.max-streams or 
dfs.namenode.replication.max-streams-hard-limit, based on the replication 
priority.


When we decommission nodes,

1. All the decommission nodes' blocks will be added to neededReplication.

2. BM will pick X number of blocks from neededReplication in each iteration. X 
is based on cluster size and some configurable multiplier. So if the cluster 
has 2000 nodes, X will be around 4000.

3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
being chosen as the source node of all these 4000 nodes. The reason the 
outstanding replication thresholds don't kick is due to the implementation of 
BlockManager.computeReplicationWorkForBlocks; 
node.getNumberOfBlocksToBeReplicated() remains zero given 
node.addBlockToBeReplicated is called after source node iteration.

{noformat}
...
  synchronized (neededReplications) {
for (int priority = 0; priority < blocksToReplicate.size(); priority++) 
{
...
chooseSourceDatanode
...
}


  for(ReplicationWork rw : work){
...
  rw.srcNode.addBlockToBeReplicated(block, targets);
...
  }
{noformat}
 
4. So several decomm-in-progress nodes A, B, C end up with 4000 
node.getNumberOfBlocksToBeReplicated().

5. If we assume each node can replicate 5 blocks per minutes, it is going to 
take 800 minutes to finish replication of these blocks.

6. Pending replication timeout kick in after 5 minutes. The items will be 
removed from the pending replication queue and added back to neededReplication. 
The replications will then be handled by other source nodes of these blocks. 
But the blocks still remain in nodes A, B, C's pending replication queue, 
DatanodeDescriptor.replicateBlocks, so A, B, C continue the replications of 
these blocks, although these blocks might have been replicated by other DNs 
after replication timeout.

7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
replication queue. Even though the block's replication timeout, no source node 
can be chosen given A, B, C all have high pending replication count. So we have 
to wait until A drains its pending replication queue. Meanwhile, the items in 
A's pending replication queue have been taken care of by other nodes and no 
longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-09-22 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143939#comment-14143939
 ] 

Lei (Eddy) Xu commented on HDFS-6877:
-

These test failures are long existing failures in trunk and they are not 
related. 

> Avoid calling checkDisk when an HDFS volume is removed during a write.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch
>
>
> Avoid calling checkDisk when an HDFS volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7127) TestLeaseRecovery leaks MiniDFSCluster instances.

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143936#comment-14143936
 ] 

Hadoop QA commented on HDFS-7127:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670485/HDFS-7127.1.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.fs.TestUrlStreamHandler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8147//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8147//console

This message is automatically generated.

> TestLeaseRecovery leaks MiniDFSCluster instances.
> -
>
> Key: HDFS-7127
> URL: https://issues.apache.org/jira/browse/HDFS-7127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7127.1.patch
>
>
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} start a 
> {{MiniDFSCluster}} but never shuts it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.

2014-09-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143935#comment-14143935
 ] 

Hadoop QA commented on HDFS-6877:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670476/HDFS-6877.004.patch
  against trunk revision 912ad32.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8142//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8142//console

This message is automatically generated.

> Avoid calling checkDisk when an HDFS volume is removed during a write.
> --
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch
>
>
> Avoid calling checkDisk when an HDFS volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7120) When aborting NameNode or JournalNode due to metadata file problems, write the contents of the metadata directories and permissions to logs.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7120:

Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-6185

> When aborting NameNode or JournalNode due to metadata file problems, write 
> the contents of the metadata directories and permissions to logs.
> 
>
> Key: HDFS-7120
> URL: https://issues.apache.org/jira/browse/HDFS-7120
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>
> If the NameNode or JournalNode aborts due to an unexpected error in the 
> metadata directories, often the root cause is that the metadata files are in 
> an unexpected state, or permissions are broken on the directories.  This 
> issue proposes that during abort, we write additional information about the 
> directory state and permissions to the logs.  This can help speed up 
> diagnosis, and ultimately recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7121) For JournalNode operations that must succeed on all nodes, attempt to undo the operation on all nodes if it fails on one node.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7121:

Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-6185

> For JournalNode operations that must succeed on all nodes, attempt to undo 
> the operation on all nodes if it fails on one node.
> --
>
> Key: HDFS-7121
> URL: https://issues.apache.org/jira/browse/HDFS-7121
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node
>Reporter: Chris Nauroth
>
> Several JournalNode operations are not satisfied by a quorum.  They must 
> succeed on every JournalNode in the cluster.  If the operation succeeds on 
> some nodes, but fails on others, then this may leave the nodes in an 
> inconsistent state and require operations to do manual recovery steps.  For 
> example, if {{doPreUpgrade}} succeeds on 2 nodes and fails on 1 node, then 
> the operator will need to correct the problem on the failed node and also 
> manually restore the previous.tmp directory to current on the 2 successful 
> nodes before reattempting the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7118) Improve diagnostics on storage directory rename operations by using NativeIO#renameTo in Storage#rename.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7118:

Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-6185

> Improve diagnostics on storage directory rename operations by using 
> NativeIO#renameTo in Storage#rename.
> 
>
> Key: HDFS-7118
> URL: https://issues.apache.org/jira/browse/HDFS-7118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>
> If a file rename fails, {{NativeIO#renameTo}} includes more information about 
> the root cause than a plain {{java.io.File#renameTo}}.  The native code can 
> throw an exception with a detailed error message and the {{errno}} on *nix or 
> the value of {{GetLastError}} on Windows.  This issue proposes to use 
> {{NativeIO#renameTo}} inside or in place of {{Storage#rename}} to help 
> improve diagnostics.  The method falls back to {{java.io.File#renameTo}} if 
> native code is not loaded, so this change would not introduce a compatibility 
> problem for deployments running without native code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7119) Split error checks in AtomicFileOutputStream#close into separate conditions to improve diagnostics.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7119:

Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-6185

> Split error checks in AtomicFileOutputStream#close into separate conditions 
> to improve diagnostics.
> ---
>
> Key: HDFS-7119
> URL: https://issues.apache.org/jira/browse/HDFS-7119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>
> {{AtomicFileOutputStream#close}} throws an exception if either deleting the 
> original file or renaming the temp file fails, but the exception isn't 
> specific about which step failed.  Splitting these into separate conditions 
> with different error messages could help diagnose a permissions problem more 
> quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143872#comment-14143872
 ] 

Arpit Agarwal commented on HDFS-6581:
-

Suresh, good suggestions. Let me file a task for metrics. A few other metrics I 
have been thinking of:
# Number of blocks written to memory
# Number of memory writes requested but not satisfied (failed-over to disk)
# Number of blocks evicted without ever being read from memory
# Average delay between memory write and disk write (window where a node 
restart could cause data loss).

It might be also useful to track how often a block was requested to be read 
shortly after eviction, however that requires more overhead to track.

We can do this work in trunk.

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143865#comment-14143865
 ] 

Suresh Srinivas commented on HDFS-6581:
---

bq. I think system-level testing will be needed. I think it's fine to merge 
without this system-level testing being done
To understand how effective memory tier is, and also to be able to tell the 
difference between different pluggable implementations, we may need good set of 
metrics. What should those metrics be? Some early thoughts:
- Number of times a block in memory was read (before being ejected)
- Average block size for data written to memory tier
- Time the block was in memory before being ejected

We probably will continue to add metrics based on use cases. We can start by 
adding some metrics we think are useful. This work can happen in trunk.

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7126:

  Component/s: test
 Target Version/s: 2.6.0
Affects Version/s: (was: 2.5.1)
 Hadoop Flags: Reviewed

+1 for the patch, pending Jenkins run.  I confirmed the test on Mac and Windows.

> TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path
> 
>
> Key: HDFS-7126
> URL: https://issues.apache.org/jira/browse/HDFS-7126
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7126.0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143773#comment-14143773
 ] 

Arpit Agarwal edited comment on HDFS-6581 at 9/22/14 9:12 PM:
--

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so. I'm hoping that this fear is not justified, but until there is an actual 
LFU or cold/warm/hot scheme implemented, we won't know for sure. As you said, 
this isn't much code, so maybe I'll do it if it remains to be done later.
Colin, LFU may work better for a general purpose cache, but this feature is 
targeting a specific use case of smaller intermediate data. Intermediate data 
is likely to be read once or very few times and is very likely to not fit the 
typical LFU use case and in fact MFU may be better. IMO without real world 
evaluation there is no data to support one over the other. Let's help HDFS 
clients evaluate it.

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so.
I don't see any reason to fear. The interface is tagged private and the 
interactions with DN are in limited portions of the FsDataset code. It will be 
easy to update if needed.

bq. to get a benchmark that makes you look better  Clearly the lazy-persist 
file will still be in RAM after caches are dropped, whereas the non-lazy one 
will not. I always repeat experiments 3 times and average, I left that out for 
brevity
Thanks for the idea, might be useful for future testing. For now I trigger the 
best case scenario for non-lazy persist (data already in buffer cache) just to 
demonstrate performance is at par. As we'd expect it to be since we're doing 
SCR from RAM in either case. The numbers are means over 1000 runs discarding 
the initial sacrificial read fetching block data to buffer cache.


was (Author: arpitagarwal):
bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so. I'm hoping that this fear is not justified, but until there is an actual 
LFU or cold/warm/hot scheme implemented, we won't know for sure. As you said, 
this isn't much code, so maybe I'll do it if it remains to be done later.
Colin, LFU may work better for a general purpose cache, but this feature is 
targeting a specific use case of smaller intermediate data. Intermediate data 
is likely to be read once or very few times and is very likely to not fit the 
typical LFU use case and in fact NFU may be better. IMO without real world 
evaluation there is no data to support one over the other. Let's help HDFS 
clients evaluate it.

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so.
I don't see any reason to fear. The interface is tagged private and the 
interactions with DN are in limited portions of the FsDataset code. It will be 
easy to update if needed.

bq. to get a benchmark that makes you look better  Clearly the lazy-persist 
file will still be in RAM after caches are dropped, whereas the non-lazy one 
will not. I always repeat experiments 3 times and average, I left that out for 
brevity
Thanks for the idea, might be useful for future testing. For now I trigger the 
best case scenario for non-lazy persist (data already in buffer cache) just to 
demonstrate performance is at par. As we'd expect it to be since we're doing 
SCR from RAM in either case. The numbers are means over 1000 runs discarding 
the initial sacrificial read fetching block data to buffer cache.

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path

2014-09-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7126:
-
Status: Patch Available  (was: Open)

> TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path
> 
>
> Key: HDFS-7126
> URL: https://issues.apache.org/jira/browse/HDFS-7126
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security
>Affects Versions: 2.5.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7126.0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path

2014-09-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7126:
-
Attachment: HDFS-7126.0.patch

> TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path
> 
>
> Key: HDFS-7126
> URL: https://issues.apache.org/jira/browse/HDFS-7126
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security
>Affects Versions: 2.5.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDFS-7126.0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6956) Allow dynamically changing the tracing level in Hadoop servers

2014-09-22 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143791#comment-14143791
 ] 

Allen Wittenauer commented on HDFS-6956:


trace should be sorted in the case statement and in the usage output.



> Allow dynamically changing the tracing level in Hadoop servers
> --
>
> Key: HDFS-6956
> URL: https://issues.apache.org/jira/browse/HDFS-6956
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-6956.002.patch, HDFS-6956.003.patch
>
>
> We should allow users to dynamically change the tracing level in Hadoop 
> servers.  The easiest way to do this is probably to have an RPC accessible 
> only to the superuser that changes tracing settings.  This would allow us to 
> turn on and off tracing on the NameNode, DataNode, etc. at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7124) Remove EncryptionZoneManager.NULL_EZ

2014-09-22 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7124:
---
Attachment: HDFS-7124.001.patch

[~andrew.wang],

Attached is a diff which removes EncryptionZoneManager.NULL_EZ. As you may 
recall, the original reason for having this unique instance was so that PB 
would have something to pass across and not have to deal with nulls. As you'll 
see, this patch changes the PBHelper code so that it uses id=-1 as the flag for 
null.


> Remove EncryptionZoneManager.NULL_EZ
> 
>
> Key: HDFS-7124
> URL: https://issues.apache.org/jira/browse/HDFS-7124
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7124.001.patch
>
>
> Remove EncryptionZoneManager.NULL_EZ so that null can be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7124) Remove EncryptionZoneManager.NULL_EZ

2014-09-22 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7124:
---
Status: Patch Available  (was: Open)

> Remove EncryptionZoneManager.NULL_EZ
> 
>
> Key: HDFS-7124
> URL: https://issues.apache.org/jira/browse/HDFS-7124
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Charles Lamb
>Assignee: Charles Lamb
>Priority: Minor
> Attachments: HDFS-7124.001.patch
>
>
> Remove EncryptionZoneManager.NULL_EZ so that null can be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7126:

Assignee: Xiaoyu Yao

> TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path
> 
>
> Key: HDFS-7126
> URL: https://issues.apache.org/jira/browse/HDFS-7126
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security
>Affects Versions: 2.5.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7127) TestLeaseRecovery leaks MiniDFSCluster instances.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7127:

Status: Patch Available  (was: Open)

> TestLeaseRecovery leaks MiniDFSCluster instances.
> -
>
> Key: HDFS-7127
> URL: https://issues.apache.org/jira/browse/HDFS-7127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7127.1.patch
>
>
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} start a 
> {{MiniDFSCluster}} but never shuts it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6581) Write to single replica in memory

2014-09-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143773#comment-14143773
 ] 

Arpit Agarwal commented on HDFS-6581:
-

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so. I'm hoping that this fear is not justified, but until there is an actual 
LFU or cold/warm/hot scheme implemented, we won't know for sure. As you said, 
this isn't much code, so maybe I'll do it if it remains to be done later.
Colin, LFU may work better for a general purpose cache, but this feature is 
targeting a specific use case of smaller intermediate data. Intermediate data 
is likely to be read once or very few times and is very likely to not fit the 
typical LFU use case and in fact NFU may be better. IMO without real world 
evaluation there is no data to support one over the other. Let's help HDFS 
clients evaluate it.

bq. My fear here is that we will try to implement a better eviction strategy, 
but find that the pluggable API introduced in HDFS-7100 is too inflexible to do 
so.
I don't see any reason to fear. The interface is tagged private and the 
interactions with DN are in limited portions of the FsDataset code. It will be 
easy to update if needed.

bq. to get a benchmark that makes you look better  Clearly the lazy-persist 
file will still be in RAM after caches are dropped, whereas the non-lazy one 
will not. I always repeat experiments 3 times and average, I left that out for 
brevity
Thanks for the idea, might be useful for future testing. For now I trigger the 
best case scenario for non-lazy persist (data already in buffer cache) just to 
demonstrate performance is at par. As we'd expect it to be since we're doing 
SCR from RAM in either case. The numbers are means over 1000 runs discarding 
the initial sacrificial read fetching block data to buffer cache.

> Write to single replica in memory
> -
>
> Key: HDFS-6581
> URL: https://issues.apache.org/jira/browse/HDFS-6581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
> HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
> HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
> HDFS-6581.merge.09.patch, HDFSWriteableReplicasInMemory.pdf, 
> Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>
>
> Per discussion with the community on HDFS-5851, we will implement writing to 
> a single replica in DN memory via DataTransferProtocol.
> This avoids some of the issues with short-circuit writes, which we can 
> revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7127) TestLeaseRecovery leaks MiniDFSCluster instances.

2014-09-22 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7127:

Attachment: HDFS-7127.1.patch

I'm attaching a patch that guarantees close of the cluster using a JUnit 
{{After}} method.  The patch looks bigger because of an indentation change, but 
the logic of the tests hasn't changed.

> TestLeaseRecovery leaks MiniDFSCluster instances.
> -
>
> Key: HDFS-7127
> URL: https://issues.apache.org/jira/browse/HDFS-7127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-7127.1.patch
>
>
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} start a 
> {{MiniDFSCluster}} but never shuts it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7127) TestLeaseRecovery leaks MiniDFSCluster instances.

2014-09-22 Thread Chris Nauroth (JIRA)

Chris Nauroth created HDFS-7127:
---

 Summary: TestLeaseRecovery leaks MiniDFSCluster instances.
 Key: HDFS-7127
 URL: https://issues.apache.org/jira/browse/HDFS-7127
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} start a 
{{MiniDFSCluster}} but never shuts it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path

2014-09-22 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-7126:


 Summary: TestEncryptionZonesWithHA assumes Unix path separator for 
KMS key store path
 Key: HDFS-7126
 URL: https://issues.apache.org/jira/browse/HDFS-7126
 Project: Hadoop HDFS
  Issue Type: Test
  Components: security
Affects Versions: 2.5.1
Reporter: Xiaoyu Yao
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7125) Report failures during adding or removing volumes

2014-09-22 Thread Lei (Eddy) Xu (JIRA)

Lei (Eddy) Xu created HDFS-7125:
---

 Summary: Report failures during adding or removing volumes
 Key: HDFS-7125
 URL: https://issues.apache.org/jira/browse/HDFS-7125
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu


The details of the failures during hot swapping volumes should be reported 
through RPC to the user who issues the reconfiguration CLI command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7081) Add new DistributedFileSystem API for getting all the existing storage policies

2014-09-22 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143723#comment-14143723
 ] 

Jing Zhao edited comment on HDFS-7081 at 9/22/14 8:22 PM:
--

Thanks for the comments, Andrew. Upload a new patch to address your comments. 
Since some of your comments on BlockStoragePolicySuite/BlockStoragePolicy are 
easy to address, I just include them in this patch. Maybe we can create a 
separate jira to address the remaining comments.

bq. Any reason we don't also expose java APIs for setting/getting the policy in 
HdfsAdmin? Need to have public classes then too.

Initially we planned to allow normal users to set storage policy in the next 
step of heterogeneous storage. Currently I just follow your comments and add 
the setting policy API into HdfsAdmin. We can remove it in the future if 
necessary. (For getting the policy it does not require superuser thus we do not 
need to add it in HdfsAdmin).

bq. Does DFSAdmin have a way of getting the list of storage policies? Not sure 
how you'd know what policy to set if you don't know what's available.

Getting the list of storage policies does not require superuser permission thus 
actually we should add a separate tool for this (similar with 
listSnapshottableDir which also allows a normal user to list snapshottble dirs 
belong to him/her). I plan to add this in a separate jira since the current 
patch is already big.

bq. Also a bit surprised that the API is string based rather than ID based. 
Making it ID based would be more efficient and also allow renaming policies if 
desired.

I guess you're referring to {{setStoragePolicy}} here? Currently internally 
(inside of NameNode) we attach ID to files/directories (thus a policy can be 
easily renamed), and we use String for end users to specify the policy, where I 
think to use the policy name is much more user-friendly. And since it is not 
common to set storage policy for a large amount of files/directories and the 
name is usually short, the efficiency may not be an issue here.

bq. FSNamesystem: Need javadoc on new getter

Done. Thanks for pointing this out.

bq. The handling of directory storage policies seems inconsistent with rename. 
If you rename an UNSPECIFIED file, it will pick up a potentially different 
storage ID from its destination. However, if it already has one set (e.g. it 
was created under a dir with a set policy), it'll keep that policy. Have you 
considered always setting the file's storage policy to the default? Then rename 
would be consistent. The other alternative is doing away with per-file storage 
policies and only specifying them on a directory. This also seems reasonable to 
me from a management perspective.
bq. This seems worth documenting at the least.

I think to have policy on both files/directories is necessary since in the 
future we can have tools/services keeping scanning the namespace and marking 
files/directories based on their access/modification time etc. Currently we do 
not keep the UNSPECIFIED policy mainly because we want to allow users to set a 
cold/warm directory and then to change data's temperature by moving 
files/directories into it. We will add more details into the document.

bq. We should also consider having tooling that can do a recursive 
setStoragePolicy for easier administration.

If we can set storage policy directly on a directory, why do we still need to 
do it recursively? But to provide a tool for easier administration (not just 
for setting storage policy) is always good.

bq. BlockStoragePolicySuite: In the class javadoc, "Suite" by itself doesn't 
mean much to me. Could we mention "collection of storage policies"?

Done.

bq. The xattr name "bsp" is not very descriptive. Maybe 
"hsm.block.storage.policy.id" instead? We dedupe the names in memory, so the 
size of the name only really matters when it's serialized to the fsimage. I 
think we could dedupe them there too with a little work.

Done. This is a very good suggestion.

bq. We also shouldn't be using the trusted namespace, since it pollutes a 
namespace that's meant for use only by the root user. I'd recommend system 
instead.

For this one I have a question. According to the current document "TRUSTED 
namespace attributes are only visible and accessible to privileged users." 
Currently the storage policy is actually set by superuser and in HDFS we do not 
have root user. So does that mean we should use trusted here?

bq. BlockStoragePolicy#readBlockStorageSuite(conf) doesn't seem to be used, 
could remove some other helper functions then too
bq. Would prefer if we didn't initialize DEFAULT_SUITE until it's needed, avoid 
some static init cost. Can be straight up removed if you remove 
readBlockStorageSuite(conf).

Actually I kept this part of code because I planed to reuse/move this part of 
code in DFSAdmin for admin to set new policies based on confi

1 2 >

1 - 100 of 153 matches

Mail list logo