[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902070#comment-14902070
 ] 

Rui Li commented on HDFS-8920:
--

Thanks guys for the review.

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: HDFS-7285
>
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8780) Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node.

2015-09-21 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-8780:

Priority: Major  (was: Critical)

> Fetching live/dead datanode list with arg true for 
> removeDecommissionNode,returns list with decom node.
> ---
>
> Key: HDFS-8780
> URL: https://issues.apache.org/jira/browse/HDFS-8780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Attachments: HDFS-8780.1.patch, HDFS-8780.2.patch, HDFS-8780.3.patch
>
>
> Current implementation: 
> ==
> DatanodeManager#removeDecomNodeFromList() , Decommissioned node will be 
> removed from dead/live node list only if below conditions are met
>  I . If the Include list is not empty. 
>  II. If include and exclude list does not have decommissioned node and node 
> state is decommissioned. 
> {code}
>   if (!hostFileManager.hasIncludes()) {
>   return;
>}
>   if ((!hostFileManager.isIncluded(node)) && 
> (!hostFileManager.isExcluded(node))
>   && node.isDecommissioned()) {
> // Include list is not empty, an existing datanode does not appear
> // in both include or exclude lists and it has been decommissioned.
> // Remove it from the node list.
> it.remove();
>   }
> {code}
> As mentioned in javadoc a datanode cannot be in "already decommissioned 
> datanode state".
> Following the steps mentioned in javadoc datanode state is "dead" and not 
> decommissioned.
> *Can we avoid the unnecessary checks and have check for the node is in 
> decommissioned state then remove from node list. ?*
> Please provide your feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902027#comment-14902027
 ] 

Zhe Zhang commented on HDFS-8920:
-

Thanks Rui for the work and Kai for the final review. Moving this back to the 
HDFS-7285 umbrella JIRA.

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: HDFS-7285
>
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8920:

Parent Issue: HDFS-7285  (was: HDFS-8031)

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: HDFS-7285
>
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8920:

Fix Version/s: HDFS-7285

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: HDFS-7285
>
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8920:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

It was commited to HDFS-7285 branch. Thanks Rui for the contribution, Colin and 
Zhe for the suggestions!

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk

2015-09-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9013:
-
Status: Patch Available  (was: Open)

> Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
> --
>
> Key: HDFS-9013
> URL: https://issues.apache.org/jira/browse/HDFS-9013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9013-branch-2.003.patch, 
> HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, 
> HDFS-9013.002-branch-2.patch
>
>
> HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time 
> in milliseconds.
> Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate 
> {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk.
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk

2015-09-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9013:
-
Attachment: HDFS-9013-branch-2.003.patch

Changed patch name, so it will apply on branch-2

> Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
> --
>
> Key: HDFS-9013
> URL: https://issues.apache.org/jira/browse/HDFS-9013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9013-branch-2.003.patch, 
> HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, 
> HDFS-9013.002-branch-2.patch
>
>
> HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time 
> in milliseconds.
> Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate 
> {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk.
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk

2015-09-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9013:
-
Status: Open  (was: Patch Available)

> Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
> --
>
> Key: HDFS-9013
> URL: https://issues.apache.org/jira/browse/HDFS-9013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, 
> HDFS-9013.002-branch-2.patch
>
>
> HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time 
> in milliseconds.
> Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate 
> {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk.
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902013#comment-14902013
 ] 

Hudson commented on HDFS-9111:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #429 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/429/])
HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to 
PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 
06022b8fdc40e50eaac63758246353058e8cfa6d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java


> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk

2015-09-21 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9013:
-
Attachment: (was: HDFS-9013.003-branch-2.patch)

> Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
> --
>
> Key: HDFS-9013
> URL: https://issues.apache.org/jira/browse/HDFS-9013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, 
> HDFS-9013.002-branch-2.patch
>
>
> HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time 
> in milliseconds.
> Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate 
> {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk.
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614
> https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902008#comment-14902008
 ] 

Hudson commented on HDFS-9111:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1161 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1161/])
HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to 
PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 
06022b8fdc40e50eaac63758246353058e8cfa6d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java


> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8968) New benchmark throughput tool for striping erasure coding

2015-09-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901998#comment-14901998
 ] 

Kai Zheng commented on HDFS-8968:
-

Hi Andrew, 

It looks like to be a good idea to have a new module like *hadoop-benchmark* 
for benchmark tools in *hadoop-tools*. Such tools should be helpful in a 
production system for identifying and verifying some performance metrics, given 
a certain cluster environment. This is particularly useful after HDFS-EC is 
completed, since then in addition to existing storage policies for storage 
types, we'll have various file forms (replication, striping, non-striping EC), 
erasure coding policies using different codec algorithms, striping settings and 
coder implementations, which will allow user to benchmark and make trade-offs 
among these options. Currently the tool implemented in this issue isn't 
perfect, and would be a good begining. Our on-going perf test effort found it 
works fine. It would be great if you could give it mroe review and confirm how 
we should proceed. 

Thanks.

> New benchmark throughput tool for striping erasure coding
> -
>
> Key: HDFS-8968
> URL: https://issues.apache.org/jira/browse/HDFS-8968
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-8968-HDFS-7285.1.patch, HDFS-8968-HDFS-7285.2.patch
>
>
> We need a new benchmark tool to measure the throughput of client writing and 
> reading considering cases or factors:
> * 3-replica or striping;
> * write or read, stateful read or positional read;
> * which erasure coder;
> * striping cell size;
> * concurrent readers/writers using processes or threads.
> The tool should be easy to use and better to avoid unnecessary local 
> environment impact, like local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901975#comment-14901975
 ] 

Kai Zheng commented on HDFS-8920:
-

Thanks Rui for the update. The new patch LGTM. +1 and will commit it soon.

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9064) NN old UI (block_info_xml) not available in 2.7.x

2015-09-21 Thread Kanaka Kumar Avvaru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanaka Kumar Avvaru reassigned HDFS-9064:
-

Assignee: Kanaka Kumar Avvaru

> NN old UI (block_info_xml) not available in 2.7.x
> -
>
> Key: HDFS-9064
> URL: https://issues.apache.org/jira/browse/HDFS-9064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Kanaka Kumar Avvaru
>Priority: Critical
>
> In 2.6.x hadoop deploys, given a blockId it was very easy to find out the 
> file name and the locations of replicas (also whether they are corrupt or 
> not).
> This was the REST call:
> {noformat}
>  http://:/block_info_xml.jsp?blockId=xxx
> {noformat}
> But this was removed by HDFS-6252 in 2.7 builds.
> Creating this jira to restore that functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901945#comment-14901945
 ] 

Hadoop QA commented on HDFS-8920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  7s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 38s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  5s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 59s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 22s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 38s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 114m 34s | Tests failed in hadoop-hdfs. |
| | | 162m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestFileAppend4 |
|   | hadoop.hdfs.TestRead |
|   | hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional |
|   | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd |
|   | hadoop.hdfs.server.datanode.TestRefreshNamenodes |
|   | hadoop.hdfs.TestHdfsAdmin |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN |
|   | hadoop.hdfs.TestClientReportBadBlock |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
|   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy |
|   | hadoop.hdfs.server.blockmanagement.TestDatanodeManager |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.TestAppendSnapshotTruncate |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
|   | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
|   | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement |
|   | hadoop.hdfs.server.namenode.TestNameNodeRpcServer |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.TestFileAppendRestart |
|   | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade |
|   | hadoop.cli.TestErasureCodingCLI |
|   | hadoop.hdfs.server.namenode.TestEditLogFileInputStream |
|   | hadoop.hdfs.protocol.TestBlockListAsLongs |
|   | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant |
|   | hadoop.hdfs.TestFileStatusWithECPolicy |
|   | hadoop.hdfs.server.namenode.TestHDFSConcat |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.TestRefreshCallQueue |
|   | hadoop.hdfs.TestListFilesInDFS |
|   | hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold |
|   | hadoop.hdfs.server.namenode.TestNameEditsConfigs |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.security.TestPermissionSymlinks |
|   | hadoop.hdfs.TestDFSRollback |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | hadoop.hdfs.TestFileConcurrentReader |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.server.datanode.TestDataNodeExit |
|   | hadoop.hdfs.server.blockmanagement.TestSequentialBlockGroupId |
|   | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.security.TestRefreshUserMappings |
|   | hadoop.hdfs.server.namenode.TestNameNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.namenode.TestAllowFormat |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.server.namenode.TestDeadDatanode |
|   | hadoop.hdfs.crypto.TestHdfsCryptoStreams |
|   | hadoop.hdfs.server.blockmanagement.TestAvailableSpaceBlockPlac

[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901935#comment-14901935
 ] 

Anu Engineer commented on HDFS-9112:


test failure is not related to the patch

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901928#comment-14901928
 ] 

Hudson commented on HDFS-9111:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #421 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/421/])
HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to 
PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 
06022b8fdc40e50eaac63758246353058e8cfa6d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java


> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901910#comment-14901910
 ] 

Hadoop QA commented on HDFS-9112:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  9s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 163m 19s | Tests failed in hadoop-hdfs. |
| | | 215m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761529/HDFS-9112.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12579/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12579/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12579/console |


This message was automatically generated.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901886#comment-14901886
 ] 

Hudson commented on HDFS-9111:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8497 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8497/])
HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to 
PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 
06022b8fdc40e50eaac63758246353058e8cfa6d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java


> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9039) Split o.a.h.hdfs.NameNodeProxies class into two classes in hadoop-hdfs-client and hadoop-hdfs modules respectively

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9039:

Attachment: HDFS-9039.001.patch

The v1 patch rebases from {{trunk}} branch.

As we moved the client-side protobuf convert methods from {{PBHelper}} to 
{{hadoop-hdfs-client}} module in [HDFS-9111], the v1 patch is pretty smaller 
than before.

> Split o.a.h.hdfs.NameNodeProxies class into two classes in hadoop-hdfs-client 
> and hadoop-hdfs modules respectively
> --
>
> Key: HDFS-9039
> URL: https://issues.apache.org/jira/browse/HDFS-9039
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9039.000.patch, HDFS-9039.001.patch
>
>
> Currently the {{org.apache.hadoop.hdfs.NameNodeProxies}} class is used by 
> both {{org.apache.hadoop.hdfs.server}} package (for server side protocols) 
> and {{DFSClient}} class (for {{ClientProtocol}}). The {{DFSClient}} class 
> should be moved to {{hadoop-hdfs-client}} module (see [HDFS-8053 | 
> https://issues.apache.org/jira/browse/HDFS-8053]). As the 
> {{org.apache.hadoop.hdfs.NameNodeProxies}} class also depends on server side 
> protocols (e.g. {{JournalProtocol}} and {{NamenodeProtocol}}), we can't 
> simply move this class to the {{hadoo-hdfs-client}} module as well.
> This jira tracks the effort of moving {{ClientProtocol}} related static 
> methods in {{org.apache.hadoop.hdfs.NameNodeProxies}} class to 
> {{hadoo-hdfs-client}} module. A good place to put these static methods is a 
> new class named {{NameNodeProxiesClient}}.
> The checkstyle warnings can be addressed in [HDFS-8979], and removing the 
> _slf4j_ logger guards when calling {{LOG.debug()}} and {{LOG.trace()}} can be 
> addressed in [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8733:

Component/s: (was: build)

> Keep server related definition in hdfs.proto on server side
> ---
>
> Key: HDFS-8733
> URL: https://issues.apache.org/jira/browse/HDFS-8733
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yi Liu
>Assignee: Mingliang Liu
> Attachments: HFDS-8733.000.patch
>
>
> In [HDFS-8726], we moved the protobuf files that define the client-sever 
> protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are 
> some server related definition. This jira tracks the effort of moving those 
> server related definition back to {{hadoop-hdfs}} module. A good place may be 
> a new file named {{HdfsServer.proto}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8733:

Status: Patch Available  (was: Open)

> Keep server related definition in hdfs.proto on server side
> ---
>
> Key: HDFS-8733
> URL: https://issues.apache.org/jira/browse/HDFS-8733
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Yi Liu
>Assignee: Mingliang Liu
> Attachments: HFDS-8733.000.patch
>
>
> In [HDFS-8726], we moved the protobuf files that define the client-sever 
> protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are 
> some server related definition. This jira tracks the effort of moving those 
> server related definition back to {{hadoop-hdfs}} module. A good place may be 
> a new file named {{HdfsServer.proto}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8733:

Attachment: HFDS-8733.000.patch

> Keep server related definition in hdfs.proto on server side
> ---
>
> Key: HDFS-8733
> URL: https://issues.apache.org/jira/browse/HDFS-8733
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Yi Liu
>Assignee: Mingliang Liu
> Attachments: HFDS-8733.000.patch
>
>
> In [HDFS-8726], we moved the protobuf files that define the client-sever 
> protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are 
> some server related definition. This jira tracks the effort of moving those 
> server related definition back to {{hadoop-hdfs}} module. A good place may be 
> a new file named {{HdfsServer.proto}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9111:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~liuml07] for the 
contribution.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901808#comment-14901808
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks a lot for your review and spend lots of time on this, Jing! 
I will update the B-Tree part to address your comments later.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901799#comment-14901799
 ] 

Hadoop QA commented on HDFS-9109:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 48s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 24s | The applied patch generated  1 
new checkstyle issues (total was 61, now 61). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  21m 44s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  69m 58s | Tests failed in hadoop-hdfs. |
| | | 136m 27s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestEncryptionZonesWithKMS |
|   | hadoop.hdfs.TestClientBlockVerification |
| Timed out tests | org.apache.hadoop.ipc.TestIPC |
|   | org.apache.hadoop.ha.TestZKFailoverControllerStress |
|   | org.apache.hadoop.crypto.key.TestKeyProviderFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761519/HDFS-9109.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12578/console |


This message was automatically generated.

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch, HDFS-9109.02.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance

2015-09-21 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HDFS-8920:
-
Attachment: HDFS-8920-HDFS-7285.2.patch

Address Kai's comments offline.

> Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt 
> performance
> -
>
> Key: HDFS-8920
> URL: https://issues.apache.org/jira/browse/HDFS-8920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch
>
>
> When we test reading data with datanodes killed, 
> {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and 
> effectively blocks the client JVM. This log seems too verbose:
> {code}
> if (chosenNode == null) {
>   DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() +
>   " after checking nodes = " + Arrays.toString(nodes) +
>   ", ignoredNodes = " + ignoredNodes);
>   return null;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901761#comment-14901761
 ] 

Jing Zhao commented on HDFS-9053:
-

Thanks for the great work, Yi! So far I just reviewed the B-Tree implementation 
part and it looks good to me. Just some minor comments:
# "static" can be removed
{code}
  public static interface Element extends Comparable {
K getKey();
  }
{code}
# The parameter is never used.
{code}
Node(boolean allocateMaxElements) {
  elements = new Object[maxElements()];
}
{code}
# It may be helpful to add some more Preconditions/assert check to verify the 
parameter and internal state. For example, some verification about the index i 
in the following code.
{code}
SplitResult split(int i) {
  E e = (E)elements[i];
  Node next = new Node(true);
  
{code}
# Optional: in insertElement maybe we can copy elements only once if we need to 
expand the array.
# Rename {{put}} to {{addOrReplace}} to make its semantic more clear?
# Need to update the javadoc of {{removeElement}} and {{removeChild}}.
# {{SplitResult#element}} and {{SplitResult#node}} can be declared as final.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901756#comment-14901756
 ] 

Hadoop QA commented on HDFS-9111:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  7s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  8s | The applied patch generated  
160 new checkstyle issues (total was 40, now 200). |
| {color:green}+1{color} | whitespace |   6m 55s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 29s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 171m  7s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 29s | Tests passed in 
hadoop-hdfs-client. |
| | | 227m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFSNamesystem |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.web.TestWebHDFSOAuth2 |
| Timed out tests | org.apache.hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761468/HDFS-9111.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12575/console |


This message was automatically generated.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9026) Support for include/exclude lists on IPv6 setup

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901745#comment-14901745
 ] 

Hadoop QA commented on HDFS-9026:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 57s | Findbugs (version ) appears to 
be broken on HADOOP-11890. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 44s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 45s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 42s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 154m  1s | Tests failed in hadoop-hdfs. |
| | | 204m 20s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestWriteRead |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.security.TestPermission |
|   | hadoop.hdfs.TestParallelRead |
|   | hadoop.fs.viewfs.TestViewFsHdfs |
|   | hadoop.hdfs.TestMiniDFSCluster |
|   | hadoop.hdfs.TestWriteConfigurationToDFS |
|   | hadoop.hdfs.web.TestWebHDFSXAttr |
|   | hadoop.hdfs.TestDFSRollback |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.TestDatanodeConfig |
|   | hadoop.fs.TestWebHdfsFileContextMainOperations |
|   | hadoop.fs.TestGlobPaths |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.fs.loadGenerator.TestLoadGenerator |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMkdir |
|   | hadoop.hdfs.TestAbandonBlock |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary |
|   | hadoop.hdfs.TestReadWhileWriting |
|   | hadoop.fs.viewfs.TestViewFileSystemWithAcls |
|   | hadoop.fs.contract.hdfs.TestHDFSContractConcat |
|   | hadoop.fs.TestSymlinkHdfsDisable |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRootDirectory |
|   | hadoop.hdfs.TestMissingBlocksAlert |
|   | hadoop.hdfs.TestBlocksScheduledCounter |
|   | hadoop.hdfs.TestSmallBlock |
|   | hadoop.cli.TestDeleteCLI |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.fs.viewfs.TestViewFsWithXAttrs |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.web.TestWebHDFSForHA |
|   | hadoop.fs.viewfs.TestViewFsDefaultValue |
|   | hadoop.fs.contract.hdfs.TestHDFSContractOpen |
|   | hadoop.hdfs.TestFSInputChecker |
|   | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRename |
|   | hadoop.hdfs.TestRemoteBlockReader |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.fs.viewfs.TestViewFsAtHdfsRoot |
|   | hadoop.hdfs.TestBlockReaderLocal |
|   | hadoop.fs.contract.hdfs.TestHDFSContractGetFileStatus |
|   | hadoop.cli.TestCryptoAdminCLI |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr |
|   | hadoop.hdfs.tools.TestDebugAdmin |
|   | hadoop.security.TestRefreshUserMappings |
|   | hadoop.hdfs.TestLargeBlock |
|   | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs |
|   | hadoop.hdfs.TestListFilesInFileContext |
|   | hadoop.fs.TestFcHdfsSetUMask |
|   | hadoop.hdfs.TestDatanodeReport |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.fs.shell.TestHdfsTextCommand |
|   | hadoop.hdfs.TestFsShellPermission |
|   | hadoop.TestGenericRefresh |
|   | hadoop.fs.TestSymlinkHdfsFileSystem |
|   | hadoop.hdfs.TestGetBlocks |
|   | hadoop.fs.contract.hdfs.TestHDFSContractAppend |
|   | hadoop.fs.contract.hdfs.TestHDFSContractDelete |
|   | hadoop.hdfs.web.TestWebHdfsTokens |
|   | hadoop.hdfs.TestEncryptionZonesWithKMS |
|   | hadoop.hdfs.TestClientReportBadBlock |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.fs.TestSWebHdfsFileContextMainOperations |
|   | hadoop.hdfs.TestRestartDFS |
|   | hadoop.hdfs.TestFileAppend4 |
|   | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement |
|   | hadoop.hdfs.TestSetTimes |
|   | hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot |
|   | hadoop.fs.contract.hdfs.TestHDFSContractSeek |
|   | hadoop.hdfs.TestSetrepIncreasing |
|   | hadoop.fs.viewfs.TestViewFsWithAcls |
|   | hadoop.hdfs.TestLease |
|   | hadoop.hdfs.TestDFSUpgrade |
|

[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901722#comment-14901722
 ] 

Haohui Mai commented on HDFS-9117:
--

I suggest bringing in RapidXML (http://rapidxml.sourceforge.net/) to parse the 
configurations and convert the XML to the {{Options}} object.

> Config file reader / options classes for libhdfs++
> --
>
> Key: HDFS-9117
> URL: https://issues.apache.org/jira/browse/HDFS-9117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>
> For environmental compatability with HDFS installations, libhdfs++ should be 
> able to read the configurations from Hadoop XML files and behave in line with 
> the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML 
> configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed 
> to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8663) sys cpu usage high on namenode server

2015-09-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HDFS-8663:
-
Assignee: (was: Eugene Koifman)

> sys cpu usage high on namenode server
> -
>
> Key: HDFS-8663
> URL: https://issues.apache.org/jira/browse/HDFS-8663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, namenode
>Affects Versions: 2.3.0
> Environment: hadoop 2.3.0 centos5.8
>Reporter: tangjunjie
>
> sys cpu usage high  on namenode  server lead to run job very slow.
> I use ps -elf see many zombie process.
> I check hdfs log I found many exceptions like:
> org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>   at org.apache.hadoop.util.Shell.run(Shell.java:418)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
>   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> Then I create all user such as sem_410 appear in exception.Then the sys cpu 
> usage on namenode down.
> BTW, my hadoop 2.3.0 enaable hadoop acl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8663) sys cpu usage high on namenode server

2015-09-21 Thread tangjunjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangjunjie reassigned HDFS-8663:


Assignee: Eugene Koifman

> sys cpu usage high on namenode server
> -
>
> Key: HDFS-8663
> URL: https://issues.apache.org/jira/browse/HDFS-8663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, namenode
>Affects Versions: 2.3.0
> Environment: hadoop 2.3.0 centos5.8
>Reporter: tangjunjie
>Assignee: Eugene Koifman
>
> sys cpu usage high  on namenode  server lead to run job very slow.
> I use ps -elf see many zombie process.
> I check hdfs log I found many exceptions like:
> org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>   at org.apache.hadoop.util.Shell.run(Shell.java:418)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52)
>   at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> Then I create all user such as sem_410 appear in exception.Then the sys cpu 
> usage on namenode down.
> BTW, my hadoop 2.3.0 enaable hadoop acl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity

2015-09-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901678#comment-14901678
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8287:
---

> ... moving DoubleCellBuffer and CellBuffers out of DFSStripedOutputStream 
> should be done with separate JIRA, ...

Sounds good.  Some comments on the patch:

{code}
+if (submittedParityGenTask) {
+  try {
+// Wait for parity gen task for previout cell.
+Future ret = completionService.take();
+ByteBuffer[] encoded = ret.get();
+for (int i = numDataBlocks; i < numAllBlocks; i++) {
+  writeParity(i, encoded[i], 
doubleCellBuffer.getReadyBuf().getChecksumArray(i));
+}
+  } catch (InterruptedException e) {
+LOG.warn("Caught InterruptedException: ", e);
+  } catch (ExecutionException e) {
+LOG.warn("Caught ExecutionException: ", e);
+  }
{code}
- The caught exception should be re-thrown as an IOException.
- Typo: "previout" should be "previous".

> DFSStripedOutputStream.writeChunk should not wait for writing parity 
> -
>
> Key: HDFS-8287
> URL: https://issues.apache.org/jira/browse/HDFS-8287
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Kai Sasaki
> Attachments: HDFS-8287-HDFS-7285.00.patch, 
> HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, 
> HDFS-8287-HDFS-7285.03.patch, HDFS-8287-HDFS-7285.04.patch, 
> HDFS-8287-HDFS-7285.05.patch, HDFS-8287-HDFS-7285.06.patch, 
> HDFS-8287-HDFS-7285.07.patch, HDFS-8287-HDFS-7285.08.patch, 
> HDFS-8287-HDFS-7285.09.patch, HDFS-8287-HDFS-7285.10.patch, 
> HDFS-8287-HDFS-7285.WIP.patch, HDFS-8287-performance-report.pdf, 
> h8287_20150911.patch, jstack-dump.txt
>
>
> When a stripping cell is full, writeChunk computes and generates parity 
> packets.  It sequentially calls waitAndQueuePacket so that user client cannot 
> continue to write data until it finishes.
> We should allow user client to continue writing instead but not blocking it 
> when writing parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901673#comment-14901673
 ] 

Hadoop QA commented on HDFS-9107:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 54s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 21s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 25s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 22s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 198m  6s | Tests failed in hadoop-hdfs. |
| | | 244m 36s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761485/HDFS-9107.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12574/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12574/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12574/console |


This message was automatically generated.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch, HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9112:
---
Attachment: HDFS-9112.002.patch

Based on [~jingzhao] comments, this change makes the error message more 
explicit. It tells the user to pass -ns if needed.

As for the test failures for the patch 1, that does not seem related to the 
patch



> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901650#comment-14901650
 ] 

Hadoop QA commented on HDFS-9112:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 24s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 48s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  24m 29s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 197m 36s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 45s | Tests passed in 
hadoop-hdfs-client. |
| | | 276m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761473/HDFS-9112.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b00392d |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12573/console |


This message was automatically generated.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client

2015-09-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901642#comment-14901642
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7858:
---

> ... then those clients might not get a response soon enough to try the other 
> NN.

[~asuresh], do you recall how long have you seen for the client waiting?  I 
might hit a similar problem recently.

> Improve HA Namenode Failover detection on the client
> 
>
> Key: HDFS-7858
> URL: https://issues.apache.org/jira/browse/HDFS-7858
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HDFS-7858.1.patch, HDFS-7858.10.patch, 
> HDFS-7858.10.patch, HDFS-7858.11.patch, HDFS-7858.12.patch, 
> HDFS-7858.13.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, HDFS-7858.3.patch, 
> HDFS-7858.4.patch, HDFS-7858.5.patch, HDFS-7858.6.patch, HDFS-7858.7.patch, 
> HDFS-7858.8.patch, HDFS-7858.9.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the 
> Active and Standby Namenodes.Clients will first try one of the NNs 
> (non-deterministically) and if its a standby NN, then it will respond to the 
> client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is 
> undergoing some GC / is busy, then those clients might not get a response 
> soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Use hedged RPCs to simultaneously call multiple configured NNs to decide 
> which is the active Namenode.
> 2) Subsequent calls, will invoke the previously successful NN.
> 3) On failover of the currently active NN, the remaining NNs will be invoked 
> to decide which is the new active 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901630#comment-14901630
 ] 

Haohui Mai commented on HDFS-9118:
--

The interfaces of logging class are quite closed to the one used in snappy and 
glog. A rational choice is to make it an abstract class and allow users to 
specify the instance in the {{Options}} instance.

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9116:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-8707
Target Version/s: HDFS-8707
  Status: Resolved  (was: Patch Available)

Committed to the HDFS-8707 branch. Thanks James for the reviews.

> Suppress false positives from Valgrind on uninitialized variables in tests
> --
>
> Key: HDFS-9116
> URL: https://issues.apache.org/jira/browse/HDFS-9116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: HDFS-8707
>
> Attachments: HDFS-9116.000.patch
>
>
> Valgrind complains about uninitialized variables in the unit tests. It should 
> be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901614#comment-14901614
 ] 

Haohui Mai commented on HDFS-9103:
--

There are definitely use cases that need full flexible APIs (rigorous testings 
are one of those). However it's great to build an easy version of APIs on top 
of that.

Speaking of the patch itself {{AsyncPreadSome}} needs to be completely 
stateless. The name {{InputStream}} might be a little bit confusing now, but I 
don't think it is a good idea to put this functionality there, as least for now.



> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901613#comment-14901613
 ] 

Jing Zhao commented on HDFS-9112:
-

Thanks for the clarification, [~anu]! I think for admin or other clients it's 
not necessary for them to clearly distinguish internal/external name services. 
The internal/external ns makes sense maybe only to DataNodes. Thus I'm 
currently leaning towards requiring admins to explicitly specify the name 
service using "-ns" option. But I completely agree with you that we should 
improve the error message.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9109:

Attachment: HDFS-9109.02.patch

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch, HDFS-9109.02.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901601#comment-14901601
 ] 

Haohui Mai commented on HDFS-9108:
--

I didn't check the assembly, but I'm surprised that running the 
{{inputstream_test}} under valgrind fails to uncover the problem.

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901599#comment-14901599
 ] 

Hadoop QA commented on HDFS-9108:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761517/HDFS-9108.000.patch |
| Optional Tests | javac unit |
| git revision | trunk / b00392d |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12577/console |


This message was automatically generated.

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901600#comment-14901600
 ] 

Anu Engineer commented on HDFS-9112:


[~jingzhao] Thanks for the pointer to the [~dlmarion] 's comments. I see that 
we had assumed that it is better to let users specify -ns option if they have 
this kind of HA setup. However it looks like both us and cloudera ran into this 
issue in the field hence I think we need to have a little more clarity with 
error messages, with the current code the error message is very cryptic.
{code}
hdfs haadmin -getServiceState nn2
Illegal argument: Unable to determine the nameservice id.
{code}
This gives no clue to the user that they are expected to specify -ns option. 
Also from the comments that you pointed me to I am not able to decipher why it 
is better to specify "-ns" by the user, when we have that information in the 
config files. Since I don't have much context on HDFS-6376, I would appreciate 
if you can provide some rationale (From cursory comment reading it looks to me 
that Dave originally had exclude settings which created some issues, but 
[~wheat9]  modified them to internal nameservices. If so using internal name 
services hopefully should not cause a failure.)

if you like I can modify this patch to print out an error message which asks 
user to add -ns option explicitly, instead of reading the name services name 
from config, that would be a trivial change. Please let me know if you think I 
should do that or if this change looks good enough.
 


> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901597#comment-14901597
 ] 

Haohui Mai commented on HDFS-9108:
--

The root cause is that {{ReadBlockContinuation}} making a copy of a reference 
instead of the value during template instantiation. The v0 patch fixes the 
problems and adds a static assert to ensure it won't happen again.

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Attachment: HDFS-9108.000.patch

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, 
> HDFS-9108.000.patch
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Attachment: (was: HDFS-9108.000.patch)

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Attachment: HDFS-9108.000.patch

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Status: Patch Available  (was: In Progress)

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9108:
-
Summary: InputStreamImpl::ReadBlockContinuation stores wrong pointers of 
buffers  (was: Pointer to read buffer isn't being passed to recvmsg syscall)

> InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
> ---
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned HDFS-9108:


Assignee: Haohui Mai  (was: James Clampffer)

> Pointer to read buffer isn't being passed to recvmsg syscall
> 
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901549#comment-14901549
 ] 

Jing Zhao commented on HDFS-9112:
-

We had a discussion in HDFS-6376 about this and [~dlmarion]'s point is it's 
better to require admin to specify the name service id using "-ns" option in 
haadmin commands in such a complex configuration scenario (please see his 
comment 
[here|https://issues.apache.org/jira/browse/HDFS-6376?focusedCommentId=14108157&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14108157]).
 

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901545#comment-14901545
 ] 

Hadoop QA commented on HDFS-8882:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m  7s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 26 new or modified test files. |
| {color:green}+1{color} | javac |   8m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  0s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 50s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 41s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 186m 25s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 29s | Tests passed in 
hadoop-hdfs-client. |
| | | 236m 22s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-client |
| Failed unit tests | hadoop.hdfs.web.TestWebHDFSOAuth2 |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761177/HDFS-8882-HDFS-7285-02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / b762199 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-client.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12570/console |


This message was automatically generated.

> Use datablocks, parityblocks and cell size from ErasureCodingPolicy
> ---
>
> Key: HDFS-8882
> URL: https://issues.apache.org/jira/browse/HDFS-8882
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-8882-HDFS-7285-01.patch, 
> HDFS-8882-HDFS-7285-02.patch
>
>
> As part of earlier development, constants were used for datablocks, parity 
> blocks and cellsize.
> Now all these are available in ec zone. Use from there and stop using 
> constant values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901534#comment-14901534
 ] 

Zhe Zhang commented on HDFS-9119:
-

We have a few options to fix the discrepancy:
# Shorten the edit log tailing interval from 2 mins to 1 min.
# Change the timeout of {{transitionToActive}} to 2 mins. This will allow us to 
add the logic to support per-RPC timeout configuration.
# A more complex solution is to add a {{prepareTransitionToActive}} RPC call.

I'm leaning toward solution #1 because it's the simplest, and more frequent 
edit log tailing (and subsequently, more edit log segments) should be an 
acceptable behavior. Please let me know if you have any concern on this 
approach.

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9119:
---

 Summary: Discrepancy between edit log tailing interval and RPC 
timeout for transitionToActive
 Key: HDFS-9119
 URL: https://issues.apache.org/jira/browse/HDFS-9119
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.7.1
Reporter: Zhe Zhang


{{EditLogTailer}} on standby NameNode tails edits from active NameNode every 2 
minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.

If active NameNode encounters very intensive metadata workload (in particular, 
a lot of {{AddOp}} and {{MkDir}} operations to create new files and 
directories), the amount of updates accumulated in the 2 mins edit log tailing 
interval is hard for the standby NameNode to catch up in the 1 min timeout 
window. If that happens, the FailoverController will timeout and give up trying 
to transition the standby to active. The old ANN will resume adding more edits. 
When the SbNN finally finishes catching up the edits and tries to become 
active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-09-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned HDFS-9119:
---

Assignee: Zhe Zhang

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9026) Support for include/exclude lists on IPv6 setup

2015-09-21 Thread Nemanja Matkovic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemanja Matkovic updated HDFS-9026:
---
Attachment: HDFS-9026-HADOOP-11890.002.patch

Rename patch to match branch name.

> Support for include/exclude lists on IPv6 setup
> ---
>
> Key: HDFS-9026
> URL: https://issues.apache.org/jira/browse/HDFS-9026
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
> Environment: This affects only IPv6 cluster setup
>Reporter: Nemanja Matkovic
>Assignee: Nemanja Matkovic
>  Labels: ipv6
> Attachments: HDFS-9026-1.patch, HDFS-9026-2.patch, 
> HDFS-9026-HADOOP-11890.002.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This is a tracking item for  having e2e IPv6 support in HDFS.
> Nate did great ground work in HDFS-8078 but for having whole feature working 
> e2e this one of the items missing.
> Basically today NN won't be able to parse IPv6 addresses if they are present 
> in include or exclude list.
> Patch has a dependency (and has been tested on IPv6 only cluster) on top of 
> HDFS-8078.14.patch 
> This should be committed to HADOOP-11890 branch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901442#comment-14901442
 ] 

Jing Zhao commented on HDFS-9106:
-

bq. Transfer timeout needs to be different from per-packet timeout.

+1 for changing the timeout.

bq. if the partial block transfer fails, the write will fail permanently 
without retrying or continuing with whatever is in the pipeline

If the partial block transfer fails, and if {{bestEffort}} is enabled, the 
current code will still use the remaining datanodes to setup the pipeline? But 
looks like the {{nodes}} may still include the new DN after the failure though.

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9106-poc.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4981) chmod 777 the .snapshot directory does not error that modification on RO snapshot is disallowed

2015-09-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901423#comment-14901423
 ] 

Xiao Chen commented on HDFS-4981:
-

Hi Stephen,

After some investigation, the root cause is that 
{{FsShellPermissions#processPath}} inside common has the optimization that if 
new permission is the same as current, no further checking is done. (The 
'Modification on a read-only snapshot is disallowed' message is from 
{{FSDirectory#getINodesInPath4Write}} inside hdfs.

At this point, the most reasonable enhancement I can think of is to add a 
special check for the .snapshot dir in FsShellPermissions. However, considering
  1. Since the perm check is ignored, no action is taken. The only thing 
missing is the error message.
  2. The possible fix is located in common where snapshot feature should not be 
exposed 
  3. According to the design in HDFS-2802, RW snapshots may be supported in the 
future. In this case we have to revert the check outside (or at least change 
the message)
I suggest not to add the fix for now.

Please let me know if you have any suggestions/feedback.
Thanks.

> chmod 777 the .snapshot directory does not error that modification on RO 
> snapshot is disallowed
> ---
>
> Key: HDFS-4981
> URL: https://issues.apache.org/jira/browse/HDFS-4981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Stephen Chu
>Assignee: Xiao Chen
>Priority: Trivial
>
> Snapshots currently are RO, so it's expected that when someone tries to 
> modify the .snapshot directory s/he is denied.
> However, if the user tries to chmod 777 the .snapshot directory, the 
> operation does not error. The user should be alerted that modifications are 
> not allowed, even if this operation didn't actually change anything.
> Using other modes will trigger the error, though.
> {code}
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 777 
> /user/schu/test_dir_1/.snapshot/
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 755 
> /user/schu/test_dir_1/.snapshot/
> chmod: changing permissions of '/user/schu/test_dir_1/.snapshot': 
> Modification on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 435 
> /user/schu/test_dir_1/.snapshot/
> chmod: changing permissions of '/user/schu/test_dir_1/.snapshot': 
> Modification on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chown hdfs 
> /user/schu/test_dir_1/.snapshot/
> chown: changing ownership of '/user/schu/test_dir_1/.snapshot': Modification 
> on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chown schu 
> /user/schu/test_dir_1/.snapshot/
> chown: changing ownership of '/user/schu/test_dir_1/.snapshot': Modification 
> on a read-only snapshot is disallowed
> [schu@hdfs-snapshots-1 hdfs]$ 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901424#comment-14901424
 ] 

Hadoop QA commented on HDFS-9111:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 10s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  6s | The applied patch generated  
160 new checkstyle issues (total was 41, now 201). |
| {color:green}+1{color} | whitespace |   4m 11s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 20s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  96m 44s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 30s | Tests passed in 
hadoop-hdfs-client. |
| | | 150m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestHDFSConcat |
|   | hadoop.hdfs.server.blockmanagement.TestSequentialBlockId |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler |
|   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
|   | hadoop.hdfs.server.namenode.TestFSNamesystemMBean |
|   | hadoop.hdfs.TestDFSInputStream |
|   | hadoop.hdfs.TestSeekBug |
|   | hadoop.hdfs.TestFileAppendRestart |
|   | hadoop.hdfs.server.namenode.TestNameNodeXAttr |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | hadoop.TestRefreshCallQueue |
|   | hadoop.hdfs.TestGetBlocks |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling |
|   | hadoop.hdfs.qjournal.client.TestQJMWithFaults |
|   | hadoop.hdfs.server.datanode.TestHSync |
|   | hadoop.hdfs.server.namenode.TestStartup |
|   | hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer |
|   | hadoop.hdfs.server.datanode.TestDeleteBlockPool |
|   | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
|   | hadoop.hdfs.TestRollingUpgradeRollback |
|   | hadoop.hdfs.TestDatanodeDeath |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.server.namenode.TestFileLimit |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing |
|   | hadoop.hdfs.qjournal.client.TestQuorumJournalManagerUnit |
|   | hadoop.hdfs.TestParallelShortCircuitLegacyRead |
|   | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes |
|   | hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.TestFileAppend4 |
|   | hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks |
|   | hadoop.hdfs.server.namenode.TestStorageRestore |
|   | hadoop.hdfs.server.blockmanagement.TestPendingReplication |
|   | hadoop.hdfs.server.namenode.TestSecurityTokenEditLog |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.server.namenode.TestAclConfigFlag |
|   | hadoop.hdfs.TestReservedRawPaths |
|   | hadoop.hdfs.TestExternalBlockReader |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS |
|   | hadoop.hdfs.server.datanode.TestIncrementalBlockReports |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory |
|   | hadoop.tracing.TestTracingShortCircuitLocalRead |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.TestFileCreationClient |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade |
|   | hadoop.hdfs.server.namenode.TestAddBlockRetry |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.Te

[jira] [Commented] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901402#comment-14901402
 ] 

Hadoop QA commented on HDFS-9109:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 29s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 22s | The applied patch generated  1 
new checkstyle issues (total was 60, now 60). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 49s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  25m 50s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  95m  8s | Tests failed in hadoop-hdfs. |
| | | 174m 58s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.shell.find.TestFind |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.web.TestWebHDFSOAuth2 |
| Timed out tests | org.apache.hadoop.hdfs.TestFileCorruption |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions |
|   | org.apache.hadoop.hdfs.server.namenode.TestFsck |
|   | org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761465/HDFS-9109.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c9cb6a5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12569/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12569/console |


This message was automatically generated.

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9091) Erasure Coding: Provide DistributedFilesystem API to getAllErasureCodingPolicies

2015-09-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9091:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285
   Status: Resolved  (was: Patch Available)

Thanks Rakesh for the work! +1 on the patch. I just committed it to the feature 
branch.

> Erasure Coding: Provide DistributedFilesystem API to 
> getAllErasureCodingPolicies
> 
>
> Key: HDFS-9091
> URL: https://issues.apache.org/jira/browse/HDFS-9091
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: HDFS-7285
>
> Attachments: HDFS-9091-HDFS-7285-00.patch
>
>
> This jira is to implement {{DFS#getAllErasureCodingPolicies()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9107:
--
Attachment: HDFS-9107.patch

Use a stopwatch to abort processing in the inner heartbeat checking loop, and 
then check at end of the entire scan for whether to skip next scan.  Even added 
a meager test.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch, HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901306#comment-14901306
 ] 

Zhe Zhang commented on HDFS-8882:
-

Thanks Vinay for the patch. It looks good overall. A couple of comments:
# Should we use {{FSDirErasureCodingOp.getErasureCodingPolicy(fsn, src)}} 
instead? A side note is that the multiple {{getErasureCodingPolicy}} methods 
are a little confusing. We should clean them up as a follow-on.
{code}
// FSDirWriteFileOp
+  INodesInPath iip = fsn.dir.getINodesInPath4Write(src, false);
+  ecPolicy = FSDirErasureCodingOp.getErasureCodingPolicy(fsn, iip);
{code}
# It would be nice to copy over the Javadoc and comments on the constants from 
{{HdfsConstants}} to {{StripedFileTestUtil}}.

> Use datablocks, parityblocks and cell size from ErasureCodingPolicy
> ---
>
> Key: HDFS-8882
> URL: https://issues.apache.org/jira/browse/HDFS-8882
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-8882-HDFS-7285-01.patch, 
> HDFS-8882-HDFS-7285-02.patch
>
>
> As part of earlier development, constants were used for datablocks, parity 
> blocks and cellsize.
> Now all these are available in ec zone. Use from there and stop using 
> constant values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9118) Add logging system for libdhfs++

2015-09-21 Thread Bob Hansen (JIRA)
Bob Hansen created HDFS-9118:


 Summary: Add logging system for libdhfs++
 Key: HDFS-9118
 URL: https://issues.apache.org/jira/browse/HDFS-9118
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-8707
Reporter: Bob Hansen


With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
library are going to have their own logging infrastructure that we're going to 
want to provide data to.  

libhdfs++ should have a logging library that:
* Is overridable and can provide sufficient information to work well with 
common C++ logging frameworks
* Has a rational default implementation 
* Is performant




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset

2015-09-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901293#comment-14901293
 ] 

James Clampffer commented on HDFS-9095:
---

Agree with bob about making the CMakeLists as robust as possible, otherwise +1 
on the patch.  Getting in the basics for logging is very nice as well.

Re: In RpcConnection methods, should we be calling into the handler while 
holding the lock on the engine state? If any method there does synchronous I/O 
or hangs for any reason, the whole Rpc system locks up.

This was done to avoid using a std::recursive_mutex because right now that 
handler only gets called from OnRecvCompleted.  I don't think the handler is 
going to be changing much unless we start using multiple connections from a 
single RpcEngine.  Lock contention is one of the things I hope to start 
profiling soon; if the overhead is negligible I'll switch that back to a 
recursive_mutex and grab the lock in the handler as well (I'll file a jira if 
that's the case).

> RPC client should fail gracefully when the connection is timed out or reset
> ---
>
> Key: HDFS-9095
> URL: https://issues.apache.org/jira/browse/HDFS-9095
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9095.000.patch
>
>
> The RPC client should fail gracefully when the connection is timed out or 
> reset. instead of bailing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8873) throttle directoryScanner

2015-09-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901285#comment-14901285
 ] 

Colin Patrick McCabe commented on HDFS-8873:


[~nroberts], I agree that it might be better to keep the old behavior of 
finishing one volume in a thread before moving on to the next.  It might 
increase our cache hit rate.  I can think of reasons to do the opposite (i.e. 
spread the load across disks), that might motivate us to add that mode as an 
option, but it seems better to focus on just throttling in this change.

> throttle directoryScanner
> -
>
> Key: HDFS-8873
> URL: https://issues.apache.org/jira/browse/HDFS-8873
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Daniel Templeton
> Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901284#comment-14901284
 ] 

Bob Hansen commented on HDFS-8855:
--

There are two separable issues; this is a performance bug in existing 
deployments, and your comment is a good outline for a new and improved 
architecture.

HDFS-7966 and the rest of your proposal could be a very good solution in future 
versions, but doesn't obviate the performance issue with deployed systems, nor 
does it answer the current use case of having a bog-simple path to get hdfs 
data via a "curl -L http:/" call.

> Webhdfs client leaks active NameNode connections
> 
>
> Key: HDFS-8855
> URL: https://issues.apache.org/jira/browse/HDFS-8855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Bob Hansen
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8855.005.patch, HDFS-8855.1.patch, 
> HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, 
> HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes

2015-09-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901250#comment-14901250
 ] 

Andrew Wang commented on HDFS-8632:
---

Private APIs don't need stability annotations, we're free to change anything 
private as long as it doesn't break public interfaces. So private interfaces 
are all "unstable" in that sense. Also since anything not marked Public is 
Private, adding Private annotations everywhere is, strictly speaking, not 
necessary. It's a good habit though :)

Overall though looks good, thanks for working on this Rakesh!

> Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes
> --
>
> Key: HDFS-8632
> URL: https://issues.apache.org/jira/browse/HDFS-8632
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8632-HDFS-7285-00.patch, 
> HDFS-8632-HDFS-7285-01.patch, HDFS-8632-HDFS-7285-02.patch, 
> HDFS-8632-HDFS-7285-03.patch
>
>
> I've noticed some of the erasure coding classes missing 
> {{@InterfaceAudience}} annotation. It would be good to identify the classes 
> and add proper annotation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset

2015-09-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901244#comment-14901244
 ] 

Bob Hansen commented on HDFS-9095:
--

Re: CMAKE_CURRENT_LIST_DIR vs. CMAKE_CURRENT_SRC_DIR: 
According to ye olde 
[StackOverflow|http://stackoverflow.com/questions/15662497/in-cmake-what-is-the-difference-between-cmake-current-source-dir-and-cmake-curr],
 it becomes more of an issue when files are included across directories (as 
some of the protobuf stuff is).  The difference is what led to hours of angst 
in HDFS-9025 where the cwd was under the CMakeLists.txt.  It's not a super-big 
deal, but once bitten, twice shy.

Re: Options - what you have here is a good start; we can discuss an 
architectural solution under HDFS-9117.

> RPC client should fail gracefully when the connection is timed out or reset
> ---
>
> Key: HDFS-9095
> URL: https://issues.apache.org/jira/browse/HDFS-9095
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9095.000.patch
>
>
> The RPC client should fail gracefully when the connection is timed out or 
> reset. instead of bailing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-09-21 Thread Bob Hansen (JIRA)
Bob Hansen created HDFS-9117:


 Summary: Config file reader / options classes for libhdfs++
 Key: HDFS-9117
 URL: https://issues.apache.org/jira/browse/HDFS-9117
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-8707
Reporter: Bob Hansen


For environmental compatability with HDFS installations, libhdfs++ should be 
able to read the configurations from Hadoop XML files and behave in line with 
the Java implementation.

Most notably, machine names and ports should be readable from Hadoop XML 
configuration files.

Similarly, an internal Options architecture for libhdfs++ should be developed 
to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901213#comment-14901213
 ] 

James Clampffer commented on HDFS-9103:
---

I agree with Bob that the C++ API should be reasonably usable on its own; it 
might not be tuned perfectly but that could be added incrementally on the user 
side.  We could supply callback(s) to handle different sorts of failures later, 
something like InputStream::onDroppedConnection.  The InputStream will still 
handle the failure it but it allows a user to see what's going on.

Just to be safe it might be worth wrapping previously_excluded_datanodes in a 
lock.  Or we should agree on threading semantics that say it doesn't need one 
there.
Otherwise +1



> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy

2015-09-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8882:

Summary: Use datablocks, parityblocks and cell size from 
ErasureCodingPolicy  (was: Use datablocks, parityblocks and cell size from ec 
zone)

> Use datablocks, parityblocks and cell size from ErasureCodingPolicy
> ---
>
> Key: HDFS-8882
> URL: https://issues.apache.org/jira/browse/HDFS-8882
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-8882-HDFS-7285-01.patch, 
> HDFS-8882-HDFS-7285-02.patch
>
>
> As part of earlier development, constants were used for datablocks, parity 
> blocks and cellsize.
> Now all these are available in ec zone. Use from there and stop using 
> constant values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901201#comment-14901201
 ] 

James Clampffer commented on HDFS-9116:
---

+1

> Suppress false positives from Valgrind on uninitialized variables in tests
> --
>
> Key: HDFS-9116
> URL: https://issues.apache.org/jira/browse/HDFS-9116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9116.000.patch
>
>
> Valgrind complains about uninitialized variables in the unit tests. It should 
> be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9112:
---
Status: Patch Available  (was: Open)

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901200#comment-14901200
 ] 

Haohui Mai commented on HDFS-8855:
--

Revisiting the use case  -- how much benefits are we getting from the cache? Is 
making a connection from DN to NN necessary at all?

There are two issues that we have experienced in production here:

* DN creates too many connections to the NN when serving WebHDFS requests. It 
happens when doing distcp over webhdfs in a large cluster (~4,000 nodes)
* There are a lot of TIME_WAIT connections when DN serves a large mount of 
concurrent, burst reads. The application sees high variances of latency when 
there are a lot of TIME_WAIT connections on the NN.

The current workflow is the following:

1. NN generates a 307 to redirect the client to the DN that is closet to the 
client
2. DN receives the request from the client. It creates a new {{DFSClient}}, 
connects to the NN and creates a {{DFSInputStream}}
3. It streams the {{DFSInputStream}} to the client as HTTP streams

My argument argument is that steps (2) and (3) are unnecessary if the DN 
exposes a {{GET_BLOCK}} call that directly streams the contents of the block. 
The problem is eliminated at the very beginning.

My proposal are:

1. Expose a {{GET_BLOCK}} call in the current DN to return the content of a 
block on the DN.
2. Create a {{WebBlockReader}} that reads the block from {{GET_BLOCK}}
3. {{WebHdfsFileSystem}} can use both {{GET_BLOCK_LOCATIONS}} and the 
{{GET_BLOCK}} to serve the data.

>From an implementation prospective, there are implementation in the HDFS-7966 
>branch for (1) already. It is straightforward to implement (2) (it's just a 
>HTTP GET). And (3) can be done by augmenting the responses of 
>{{GET_BLOCK_LOCATIONS}} on whether the DN supports the {{GET_BLOCK}} call.

Thoughts?

> Webhdfs client leaks active NameNode connections
> 
>
> Key: HDFS-8855
> URL: https://issues.apache.org/jira/browse/HDFS-8855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Bob Hansen
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8855.005.patch, HDFS-8855.1.patch, 
> HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, 
> HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9112:
---
Attachment: HDFS-9112.001.patch

[~atm] Thanks for letting me know. [~templedf] I would appreciate if you can 
take a look at this patch.

This patch fixes getNamenodeServiceAddr by looking at dfs.internal.nameservices 
and choosing the right name if we have more than one name entry in 
dfs.nameservices.

Along with Unit tests, manually verified that haadmin command is now able to 
locate nameserver URI if we have the setup described in HDFS-6376

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9112.001.patch
>
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall

2015-09-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9108:
--
Priority: Blocker  (was: Major)

> Pointer to read buffer isn't being passed to recvmsg syscall
> 
>
> Key: HDFS-9108
> URL: https://issues.apache.org/jira/browse/HDFS-9108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
> Environment: Ubuntu x86_64, gcc 4.8.2
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Blocker
> Attachments: 9108-async-repro.patch, 9108-async-repro.patch1
>
>
> Somewhere between InputStream->PositionRead and the asio code the pointer to 
> the destination buffer gets lost.  PositionRead will correctly return the 
> number of bytes read but the buffer won't be filled.
> This only seems to effect the remote_block_reader, RPC calls are working.
> Valgrind error:
> Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s)
> msg.msg_iov[0] should equal the buffer pointer passed to PositionRead
> Hit when using a promise to make the async call block until completion. 
> auto stat = std::make_shared>();
> std::future future(stat->get_future());
> size_t readCount = 0;
> auto h = [stat, &readCount,buf](const Status &s, size_t bytes) {
>   stat->set_value(s);
>   readCount = bytes;
> };
> char buf[50];
> inputStream->PositionRead(buf, 50, 0, h);
>   
> //wait for async to finish
> future.get();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-9040:
---

Assignee: Jing Zhao

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Jing Zhao
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901197#comment-14901197
 ] 

Jing Zhao commented on HDFS-9040:
-

bq. In short, bumpGS is useful for choosing working set(healthy replicas). It's 
not useful for calculating safe length with given working set. (I think Jing 
Zhao just said that if I understand correctly.)
bq. I agree we should bump GS when handling DN failures in write pipeline.

Cool, if we all agree bump GS is still useful, my current proposal is to add 
the logic "flushing data before bumping GS for failure recovery" for the patch. 
I will upload a new patch today or tomorrow.

bq. We can discuss lease recovery at another jira.

Agree. Lease recovery is tricky and we can start from some design doc first 
maybe.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901178#comment-14901178
 ] 

Hadoop QA commented on HDFS-9116:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761470/HDFS-9116.000.patch |
| Optional Tests | javac unit |
| git revision | trunk / b00392d |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12572/console |


This message was automatically generated.

> Suppress false positives from Valgrind on uninitialized variables in tests
> --
>
> Key: HDFS-9116
> URL: https://issues.apache.org/jira/browse/HDFS-9116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9116.000.patch
>
>
> Valgrind complains about uninitialized variables in the unit tests. It should 
> be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9110) Improve upon HDFS-8480

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901175#comment-14901175
 ] 

Hadoop QA commented on HDFS-9110:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 48s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 19s | The applied patch generated  5 
new checkstyle issues (total was 2, now 6). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 10s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 194m 17s | Tests failed in hadoop-hdfs. |
| | | 239m 19s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.TestGenericRefresh |
|   | hadoop.cli.TestAclCLI |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761437/HDFS-9110.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c9cb6a5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12567/console |


This message was automatically generated.

> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Fix For: 2.7.0, 2.6.1
>
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, 
> HDFS-9110.02.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC

2015-09-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901165#comment-14901165
 ] 

Colin Patrick McCabe commented on HDFS-9107:


bq. I don't trust monotonicNow if the thread can suspend between calls; cores 
on different sockets may give different answers, though it's not something I've 
seen in the field.

Oracle's blog here [ 
https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks ] says:

bq. If you are interested in measuring/calculating elapsed time, then always 
use System.nanoTime(). On most systems it will give a resolution on the order 
of microseconds. Be aware though, this call can also take microseconds to 
execute on some platforms.

Of course, {{System#nanoTime}} is just a very thin wrapper around the operating 
system's monotonic clock.  In x86-land, the monotonic clock generally comes 
from one of two sources: the TSC (timestamp counter) or the HPET (high 
precision event timer).

In the 2000s, the TSC started becoming less useful because multi-core systems 
started becoming more common, and at that time, TSC wasn't synchronized across 
cores.  This has since changed (at least for Intel systems), and the TSC is now 
synchronized across cores.  So the alarm you are raising is about 5 years too 
late.  Anyway, if you have a "bad" TSC, you can still get {{System#nanoTime}} 
to behave correctly by switching your operating system's clock source to the 
HPET.  It's slower, but more reliable.

If you want to read more about this, check out 
https://software.intel.com/en-us/forums/intel-isa-extensions/topic/332570

tl;dr
1. Operating systems implement various tricks to work around TSC bad behaviors
2. TSC bad behaviors are becoming less common in modern CPUs
3. You don't have to use the TSC if you don't want to!

Let's let the hardware and OS people do their job and just do ours.

I agree with [~hitliuyi]... +1 for the patch.  Would be even better if we could 
close that small window of a GC happening at a time other than during the 
{{Thread#sleep}}.

> Prevent NN's unrecoverable death spiral after full GC
> -
>
> Key: HDFS-9107
> URL: https://issues.apache.org/jira/browse/HDFS-9107
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-9107.patch
>
>
> A full GC pause in the NN that exceeds the dead node interval can lead to an 
> infinite cycle of full GCs.  The most common situation that precipitates an 
> unrecoverable state is a network issue that temporarily cuts off multiple 
> racks.
> The NN wakes up and falsely starts marking nodes dead. This bloats the 
> replication queues which increases memory pressure. The replications create a 
> flurry of incremental block reports and a glut of over-replicated blocks.
> The "dead" nodes heartbeat within seconds. The NN forces a re-registration 
> which requires a full block report - more memory pressure. The NN now has to 
> invalidate all the over-replicated blocks. The extra blocks are added to 
> invalidation queues, tracked in an excess blocks map, etc - much more memory 
> pressure.
> All the memory pressure can push the NN into another full GC which repeats 
> the entire cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7766) Add a flag to WebHDFS op=CREATE to not respond with a 307 redirect

2015-09-21 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901152#comment-14901152
 ] 

Ravi Prakash commented on HDFS-7766:


I'm assuming 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.html#L58
 was commented for the same reason [~wheat9] ? [~jingzhao] ? 

> Add a flag to WebHDFS op=CREATE to not respond with a 307 redirect
> --
>
> Key: HDFS-7766
> URL: https://issues.apache.org/jira/browse/HDFS-7766
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-7766.01.patch, HDFS-7766.02.patch
>
>
> Please see 
> https://issues.apache.org/jira/browse/HDFS-7588?focusedCommentId=14276192&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14276192
> A backwards compatible manner we can fix this is to add a flag on the request 
> which would disable the redirect, i.e.
> {noformat}
> curl -i -X PUT 
> "http://:/webhdfs/v1/?op=CREATE[&noredirect=]
> {noformat}
> returns 200 with the DN location in the response.
> This would allow the Browser clients to get the redirect URL to put the file 
> to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes

2015-09-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901153#comment-14901153
 ] 

Zhe Zhang commented on HDFS-8632:
-

Thanks Rakesh for the work! Most annotations in the patch look good. The 
following are worth more discussions. [~andrew.wang] Could you share some 
advice in the context of release management?

{code}
+@InterfaceAudience.Public
+@InterfaceStability.Evolving
 public final class ErasureCodingPolicy
{code}
{{Evolving}} actually sounds right to me. A side note is that we should 
probably have something similar to {{BlockStoragePolicySpi}} that is {{Stable}}.

{code}
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
 public class DFSStripedInputStream extends DFSInputStream {
{code}
{{DFSInputStream}} itself is {{Unstable}} (the default for {{Private}}). I 
guess we should make them consistent. Similar for {{StripedDataStreamer}} and 
{{BlockInfoStriped}}.

{code}
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
 public class BlockPlacementPolicies{
{code}
Similar as above, should this be {{Evolving}} or the default {{Unstable}}?

> Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes
> --
>
> Key: HDFS-8632
> URL: https://issues.apache.org/jira/browse/HDFS-8632
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-8632-HDFS-7285-00.patch, 
> HDFS-8632-HDFS-7285-01.patch, HDFS-8632-HDFS-7285-02.patch, 
> HDFS-8632-HDFS-7285-03.patch
>
>
> I've noticed some of the erasure coding classes missing 
> {{@InterfaceAudience}} annotation. It would be good to identify the classes 
> and add proper annotation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9116:
-
Status: Patch Available  (was: Open)

> Suppress false positives from Valgrind on uninitialized variables in tests
> --
>
> Key: HDFS-9116
> URL: https://issues.apache.org/jira/browse/HDFS-9116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9116.000.patch
>
>
> Valgrind complains about uninitialized variables in the unit tests. It should 
> be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9116:
-
Attachment: HDFS-9116.000.patch

> Suppress false positives from Valgrind on uninitialized variables in tests
> --
>
> Key: HDFS-9116
> URL: https://issues.apache.org/jira/browse/HDFS-9116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9116.000.patch
>
>
> Valgrind complains about uninitialized variables in the unit tests. It should 
> be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests

2015-09-21 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-9116:


 Summary: Suppress false positives from Valgrind on uninitialized 
variables in tests
 Key: HDFS-9116
 URL: https://issues.apache.org/jira/browse/HDFS-9116
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor


Valgrind complains about uninitialized variables in the unit tests. It should 
be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5897) TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk

2015-09-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HDFS-5897.
--
Resolution: Cannot Reproduce

> TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk
> 
>
> Key: HDFS-5897
> URL: https://issues.apache.org/jira/browse/HDFS-5897
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
> Attachments: 5897-output.html
>
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1665/testReport/junit/org.apache.hadoop.hdfs.qjournal/TestNNWithQJM/testNewNamenodeTakesOverWriter/
>  :
> {code}
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:129)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:412)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:401)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
> {code}
> I saw:
> {code}
> 2014-02-06 11:38:37,970 ERROR namenode.EditLogInputStream 
> (RedundantEditLogInputStream.java:nextOp(221)) - Got error reading edit log 
> input stream 
> http://localhost:40509/getJournal?jid=myjournal&segmentTxId=3&storageInfo=-51%3A1571339494%3A0%3AtestClusterID;
>  failing over to edit log 
> http://localhost:56244/getJournal?jid=myjournal&segmentTxId=3&storageInfo=-51%3A1571339494%3A0%3AtestClusterID
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
>  got premature end-of-file at txid 0; expected file to go up to 4
>   at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:194)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:140)
>   at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178)
>   at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:167)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:120)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:708)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:606)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:263)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:874)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:634)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:446)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:502)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:658)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:643)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1291)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:939)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:824)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:678)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
>   at 
> org.apache.hadoop.hdfs.qjournal.TestNNWithQJM.testNewNamenodeTakesOverWriter(TestNNWithQJM.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.

[jira] [Assigned] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist

2015-09-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HDFS-6264:


Assignee: Ted Yu

> Provide FileSystem#create() variant which throws exception if parent 
> directory doesn't exist
> 
>
> Key: HDFS-6264
> URL: https://issues.apache.org/jira/browse/HDFS-6264
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: hbase
> Attachments: hdfs-6264-v1.txt
>
>
> FileSystem#createNonRecursive() is deprecated.
> However, there is no DistributedFileSystem#create() implementation which 
> throws exception if parent directory doesn't exist.
> This limits clients' migration away from the deprecated method.
> For HBase, IO fencing relies on the behavior of 
> FileSystem#createNonRecursive().
> Variant of create() method should be added which throws exception if parent 
> directory doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9111:

Attachment: HDFS-9111.002.patch

Thank you [~wheat9]. The v2 patch rebases from {{trunk}} branch resolving all 
conflicts.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, 
> HDFS-9111.002.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8873) throttle directoryScanner

2015-09-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901136#comment-14901136
 ] 

Daniel Templeton commented on HDFS-8873:


The scanjob queue is indeed ignoring volume when selecting the next job.  I was 
considering the case where there are volumes of greatly differing sizes, in 
which case not binding a thread to a volume will result in a better 
distribution of the load.  That's also true when the number of threads exceeds 
the number of volumes.

That said, the point of the JIRA was not to change the load profile of the 
directory scanner; it was just to insert a throttle.  I'll post a changeset 
with a reduced scope shortly.

> throttle directoryScanner
> -
>
> Key: HDFS-8873
> URL: https://issues.apache.org/jira/browse/HDFS-8873
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Nathan Roberts
>Assignee: Daniel Templeton
> Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901121#comment-14901121
 ] 

Haohui Mai commented on HDFS-9111:
--

Turns out it's needs to be rebased to trunk. [~liuml07] can you please rebase 
the patch? Thanks.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient

2015-09-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901118#comment-14901118
 ] 

Haohui Mai commented on HDFS-9111:
--

+1. I'll commit it shortly.

> Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
> -
>
> Key: HDFS-9111
> URL: https://issues.apache.org/jira/browse/HDFS-9111
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch
>
>
> *TL;DR* This jira tracks the effort of moving PB helper methods, which 
> convert client side data structure to and from protobuf, to the 
> {{hadoop-hdfs-client}} module.
> Currently the {{PBHelper}} class contains helper methods converting both 
> client and server side data structures from/to protobuf. As we move client 
> (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and 
> [HDFS-9039]), we also need to move client module related PB converters to 
> client module.
> A good place may be a new class named {{PBHelperClient}}. After this, the 
> existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters 
> for converting server side data structures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9109:

Attachment: HDFS-9109.01.patch

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups

2015-09-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9109:

Attachment: (was: HDFS-9109.01.patch)

> dfs.datanode.dns.interface does not work with hosts file based setups
> -
>
> Key: HDFS-9109
> URL: https://issues.apache.org/jira/browse/HDFS-9109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9109.01.patch
>
>
> The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode 
> select its hostname by doing a reverse lookup of IP addresses on the specific 
> network interface. This does not work {{when /etc/hosts}} is used to setup 
> alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9115) Create documentation to describe the overall architecture and rationales of the library

2015-09-21 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-9115:


 Summary: Create documentation to describe the overall architecture 
and rationales of the library
 Key: HDFS-9115
 URL: https://issues.apache.org/jira/browse/HDFS-9115
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-8707


It's beneficial to have documentations to describe the design decisions and 
rationales of the library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >