[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902070#comment-14902070 ] Rui Li commented on HDFS-8920: -- Thanks guys for the review. > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Fix For: HDFS-7285 > > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8780) Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node.
[ https://issues.apache.org/jira/browse/HDFS-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-8780: Priority: Major (was: Critical) > Fetching live/dead datanode list with arg true for > removeDecommissionNode,returns list with decom node. > --- > > Key: HDFS-8780 > URL: https://issues.apache.org/jira/browse/HDFS-8780 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: J.Andreina >Assignee: J.Andreina > Attachments: HDFS-8780.1.patch, HDFS-8780.2.patch, HDFS-8780.3.patch > > > Current implementation: > == > DatanodeManager#removeDecomNodeFromList() , Decommissioned node will be > removed from dead/live node list only if below conditions are met > I . If the Include list is not empty. > II. If include and exclude list does not have decommissioned node and node > state is decommissioned. > {code} > if (!hostFileManager.hasIncludes()) { > return; >} > if ((!hostFileManager.isIncluded(node)) && > (!hostFileManager.isExcluded(node)) > && node.isDecommissioned()) { > // Include list is not empty, an existing datanode does not appear > // in both include or exclude lists and it has been decommissioned. > // Remove it from the node list. > it.remove(); > } > {code} > As mentioned in javadoc a datanode cannot be in "already decommissioned > datanode state". > Following the steps mentioned in javadoc datanode state is "dead" and not > decommissioned. > *Can we avoid the unnecessary checks and have check for the node is in > decommissioned state then remove from node list. ?* > Please provide your feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902027#comment-14902027 ] Zhe Zhang commented on HDFS-8920: - Thanks Rui for the work and Kai for the final review. Moving this back to the HDFS-7285 umbrella JIRA. > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Fix For: HDFS-7285 > > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8920: Parent Issue: HDFS-7285 (was: HDFS-8031) > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Fix For: HDFS-7285 > > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8920: Fix Version/s: HDFS-7285 > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Fix For: HDFS-7285 > > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8920: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) It was commited to HDFS-7285 branch. Thanks Rui for the contribution, Colin and Zhe for the suggestions! > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
[ https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-9013: - Status: Patch Available (was: Open) > Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk > -- > > Key: HDFS-9013 > URL: https://issues.apache.org/jira/browse/HDFS-9013 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9013-branch-2.003.patch, > HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, > HDFS-9013.002-branch-2.patch > > > HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time > in milliseconds. > Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate > {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk. > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614 > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
[ https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-9013: - Attachment: HDFS-9013-branch-2.003.patch Changed patch name, so it will apply on branch-2 > Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk > -- > > Key: HDFS-9013 > URL: https://issues.apache.org/jira/browse/HDFS-9013 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9013-branch-2.003.patch, > HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, > HDFS-9013.002-branch-2.patch > > > HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time > in milliseconds. > Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate > {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk. > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614 > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
[ https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-9013: - Status: Open (was: Patch Available) > Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk > -- > > Key: HDFS-9013 > URL: https://issues.apache.org/jira/browse/HDFS-9013 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, > HDFS-9013.002-branch-2.patch > > > HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time > in milliseconds. > Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate > {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk. > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614 > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902013#comment-14902013 ] Hudson commented on HDFS-9111: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #429 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/429/]) HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 06022b8fdc40e50eaac63758246353058e8cfa6d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9013) Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk
[ https://issues.apache.org/jira/browse/HDFS-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-9013: - Attachment: (was: HDFS-9013.003-branch-2.patch) > Deprecate NameNodeMXBean#getNNStarted in branch2 and remove from trunk > -- > > Key: HDFS-9013 > URL: https://issues.apache.org/jira/browse/HDFS-9013 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9013.001-branch-2.patch, HDFS-9013.001.patch, > HDFS-9013.002-branch-2.patch > > > HDFS-8388 added one new metric {{NNStartedTimeInMillis}} to get NN start time > in milliseconds. > Now based on [~wheat9] and [~ajisakaa] suggestions now we should deprecate > {{NameNodeMXBean#getNNStarted}} in branch2 and remove from trunk. > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14709614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709614 > https://issues.apache.org/jira/browse/HDFS-8388?focusedCommentId=14726746&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14726746 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902008#comment-14902008 ] Hudson commented on HDFS-9111: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1161 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1161/]) HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 06022b8fdc40e50eaac63758246353058e8cfa6d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8968) New benchmark throughput tool for striping erasure coding
[ https://issues.apache.org/jira/browse/HDFS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901998#comment-14901998 ] Kai Zheng commented on HDFS-8968: - Hi Andrew, It looks like to be a good idea to have a new module like *hadoop-benchmark* for benchmark tools in *hadoop-tools*. Such tools should be helpful in a production system for identifying and verifying some performance metrics, given a certain cluster environment. This is particularly useful after HDFS-EC is completed, since then in addition to existing storage policies for storage types, we'll have various file forms (replication, striping, non-striping EC), erasure coding policies using different codec algorithms, striping settings and coder implementations, which will allow user to benchmark and make trade-offs among these options. Currently the tool implemented in this issue isn't perfect, and would be a good begining. Our on-going perf test effort found it works fine. It would be great if you could give it mroe review and confirm how we should proceed. Thanks. > New benchmark throughput tool for striping erasure coding > - > > Key: HDFS-8968 > URL: https://issues.apache.org/jira/browse/HDFS-8968 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-8968-HDFS-7285.1.patch, HDFS-8968-HDFS-7285.2.patch > > > We need a new benchmark tool to measure the throughput of client writing and > reading considering cases or factors: > * 3-replica or striping; > * write or read, stateful read or positional read; > * which erasure coder; > * striping cell size; > * concurrent readers/writers using processes or threads. > The tool should be easy to use and better to avoid unnecessary local > environment impact, like local disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901975#comment-14901975 ] Kai Zheng commented on HDFS-8920: - Thanks Rui for the update. The new patch LGTM. +1 and will commit it soon. > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9064) NN old UI (block_info_xml) not available in 2.7.x
[ https://issues.apache.org/jira/browse/HDFS-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kanaka Kumar Avvaru reassigned HDFS-9064: - Assignee: Kanaka Kumar Avvaru > NN old UI (block_info_xml) not available in 2.7.x > - > > Key: HDFS-9064 > URL: https://issues.apache.org/jira/browse/HDFS-9064 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Kanaka Kumar Avvaru >Priority: Critical > > In 2.6.x hadoop deploys, given a blockId it was very easy to find out the > file name and the locations of replicas (also whether they are corrupt or > not). > This was the REST call: > {noformat} > http://:/block_info_xml.jsp?blockId=xxx > {noformat} > But this was removed by HDFS-6252 in 2.7 builds. > Creating this jira to restore that functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901945#comment-14901945 ] Hadoop QA commented on HDFS-8920: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 7s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 38s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 5s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 59s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 38s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 114m 34s | Tests failed in hadoop-hdfs. | | | | 162m 24s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.TestRead | | | hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional | | | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd | | | hadoop.hdfs.server.datanode.TestRefreshNamenodes | | | hadoop.hdfs.TestHdfsAdmin | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN | | | hadoop.hdfs.TestClientReportBadBlock | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.hdfs.server.blockmanagement.TestDatanodeManager | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.TestWriteStripedFileWithFailure | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement | | | hadoop.hdfs.server.namenode.TestNameNodeRpcServer | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.TestFileAppendRestart | | | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade | | | hadoop.cli.TestErasureCodingCLI | | | hadoop.hdfs.server.namenode.TestEditLogFileInputStream | | | hadoop.hdfs.protocol.TestBlockListAsLongs | | | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant | | | hadoop.hdfs.TestFileStatusWithECPolicy | | | hadoop.hdfs.server.namenode.TestHDFSConcat | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.TestRefreshCallQueue | | | hadoop.hdfs.TestListFilesInDFS | | | hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold | | | hadoop.hdfs.server.namenode.TestNameEditsConfigs | | | hadoop.hdfs.TestMiniDFSCluster | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.security.TestPermissionSymlinks | | | hadoop.hdfs.TestDFSRollback | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica | | | hadoop.hdfs.TestFileConcurrentReader | | | hadoop.hdfs.TestFileAppend2 | | | hadoop.hdfs.server.datanode.TestDataNodeExit | | | hadoop.hdfs.server.blockmanagement.TestSequentialBlockGroupId | | | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA | | | hadoop.hdfs.TestGetFileChecksum | | | hadoop.security.TestRefreshUserMappings | | | hadoop.hdfs.server.namenode.TestNameNodeRespectsBindHostKeys | | | hadoop.hdfs.server.namenode.TestMetadataVersionOutput | | | hadoop.hdfs.server.namenode.ha.TestHAMetrics | | | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.namenode.TestAllowFormat | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.server.namenode.TestDeadDatanode | | | hadoop.hdfs.crypto.TestHdfsCryptoStreams | | | hadoop.hdfs.server.blockmanagement.TestAvailableSpaceBlockPlac
[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901935#comment-14901935 ] Anu Engineer commented on HDFS-9112: test failure is not related to the patch > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901928#comment-14901928 ] Hudson commented on HDFS-9111: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #421 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/421/]) HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 06022b8fdc40e50eaac63758246353058e8cfa6d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901910#comment-14901910 ] Hadoop QA commented on HDFS-9112: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 9m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 163m 19s | Tests failed in hadoop-hdfs. | | | | 215m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761529/HDFS-9112.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b00392d | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12579/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12579/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12579/console | This message was automatically generated. > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901886#comment-14901886 ] Hudson commented on HDFS-9111: -- FAILURE: Integrated in Hadoop-trunk-Commit #8497 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8497/]) HDFS-9111. Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient. Contributed by Mingliang Liu. (wheat9: rev 06022b8fdc40e50eaac63758246353058e8cfa6d) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsLocatedFileStatus.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/JournalProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9039) Split o.a.h.hdfs.NameNodeProxies class into two classes in hadoop-hdfs-client and hadoop-hdfs modules respectively
[ https://issues.apache.org/jira/browse/HDFS-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9039: Attachment: HDFS-9039.001.patch The v1 patch rebases from {{trunk}} branch. As we moved the client-side protobuf convert methods from {{PBHelper}} to {{hadoop-hdfs-client}} module in [HDFS-9111], the v1 patch is pretty smaller than before. > Split o.a.h.hdfs.NameNodeProxies class into two classes in hadoop-hdfs-client > and hadoop-hdfs modules respectively > -- > > Key: HDFS-9039 > URL: https://issues.apache.org/jira/browse/HDFS-9039 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9039.000.patch, HDFS-9039.001.patch > > > Currently the {{org.apache.hadoop.hdfs.NameNodeProxies}} class is used by > both {{org.apache.hadoop.hdfs.server}} package (for server side protocols) > and {{DFSClient}} class (for {{ClientProtocol}}). The {{DFSClient}} class > should be moved to {{hadoop-hdfs-client}} module (see [HDFS-8053 | > https://issues.apache.org/jira/browse/HDFS-8053]). As the > {{org.apache.hadoop.hdfs.NameNodeProxies}} class also depends on server side > protocols (e.g. {{JournalProtocol}} and {{NamenodeProtocol}}), we can't > simply move this class to the {{hadoo-hdfs-client}} module as well. > This jira tracks the effort of moving {{ClientProtocol}} related static > methods in {{org.apache.hadoop.hdfs.NameNodeProxies}} class to > {{hadoo-hdfs-client}} module. A good place to put these static methods is a > new class named {{NameNodeProxiesClient}}. > The checkstyle warnings can be addressed in [HDFS-8979], and removing the > _slf4j_ logger guards when calling {{LOG.debug()}} and {{LOG.trace()}} can be > addressed in [HDFS-8971]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side
[ https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8733: Component/s: (was: build) > Keep server related definition in hdfs.proto on server side > --- > > Key: HDFS-8733 > URL: https://issues.apache.org/jira/browse/HDFS-8733 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Yi Liu >Assignee: Mingliang Liu > Attachments: HFDS-8733.000.patch > > > In [HDFS-8726], we moved the protobuf files that define the client-sever > protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are > some server related definition. This jira tracks the effort of moving those > server related definition back to {{hadoop-hdfs}} module. A good place may be > a new file named {{HdfsServer.proto}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side
[ https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8733: Status: Patch Available (was: Open) > Keep server related definition in hdfs.proto on server side > --- > > Key: HDFS-8733 > URL: https://issues.apache.org/jira/browse/HDFS-8733 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: build >Reporter: Yi Liu >Assignee: Mingliang Liu > Attachments: HFDS-8733.000.patch > > > In [HDFS-8726], we moved the protobuf files that define the client-sever > protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are > some server related definition. This jira tracks the effort of moving those > server related definition back to {{hadoop-hdfs}} module. A good place may be > a new file named {{HdfsServer.proto}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8733) Keep server related definition in hdfs.proto on server side
[ https://issues.apache.org/jira/browse/HDFS-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8733: Attachment: HFDS-8733.000.patch > Keep server related definition in hdfs.proto on server side > --- > > Key: HDFS-8733 > URL: https://issues.apache.org/jira/browse/HDFS-8733 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: build >Reporter: Yi Liu >Assignee: Mingliang Liu > Attachments: HFDS-8733.000.patch > > > In [HDFS-8726], we moved the protobuf files that define the client-sever > protocols to {{hadoop-hdfs-client}} module. In {{hdfs.proto}} , there are > some server related definition. This jira tracks the effort of moving those > server related definition back to {{hadoop-hdfs}} module. A good place may be > a new file named {{HdfsServer.proto}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9111: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~liuml07] for the contribution. > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree
[ https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901808#comment-14901808 ] Yi Liu commented on HDFS-9053: -- Thanks a lot for your review and spend lots of time on this, Jing! I will update the B-Tree part to address your comments later. > Support large directories efficiently using B-Tree > -- > > Key: HDFS-9053 > URL: https://issues.apache.org/jira/browse/HDFS-9053 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Critical > Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 > (BTree).patch, HDFS-9053.001.patch > > > This is a long standing issue, we were trying to improve this in the past. > Currently we use an ArrayList for the children under a directory, and the > children are ordered in the list, for insert/delete/search, the time > complexity is O(log n), but insertion/deleting causes re-allocations and > copies of big arrays, so the operations are costly. For example, if the > children grow to 1M size, the ArrayList will resize to > 1M capacity, so need > > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS > cluster where namenode heap memory is already highly used. I recap the 3 > main issues: > # Insertion/deletion operations in large directories are expensive because > re-allocations and copies of big arrays. > # Dynamically allocate several MB continuous heap memory which will be > long-lived can easily cause full GC problem. > # Even most children are removed later, but the directory INode still > occupies same size heap memory, since the ArrayList will never shrink. > This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to > solve the problem suggested by [~shv]. > So the target of this JIRA is to implement a low memory footprint B-Tree and > use it to replace ArrayList. > If the elements size is not large (less than the maximum degree of B-Tree > node), the B-Tree only has one root node which contains an array for the > elements. And if the size grows large enough, it will split automatically, > and if elements are removed, then B-Tree nodes can merge automatically (see > more: https://en.wikipedia.org/wiki/B-tree). It will solve the above 3 > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups
[ https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901799#comment-14901799 ] Hadoop QA commented on HDFS-9109: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 48s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 12s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 24s | The applied patch generated 1 new checkstyle issues (total was 61, now 61). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 21m 44s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 69m 58s | Tests failed in hadoop-hdfs. | | | | 136m 27s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestEncryptionZonesWithKMS | | | hadoop.hdfs.TestClientBlockVerification | | Timed out tests | org.apache.hadoop.ipc.TestIPC | | | org.apache.hadoop.ha.TestZKFailoverControllerStress | | | org.apache.hadoop.crypto.key.TestKeyProviderFactory | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761519/HDFS-9109.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b00392d | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12578/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12578/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12578/console | This message was automatically generated. > dfs.datanode.dns.interface does not work with hosts file based setups > - > > Key: HDFS-9109 > URL: https://issues.apache.org/jira/browse/HDFS-9109 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9109.01.patch, HDFS-9109.02.patch > > > The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode > select its hostname by doing a reverse lookup of IP addresses on the specific > network interface. This does not work {{when /etc/hosts}} is used to setup > alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8920) Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt performance
[ https://issues.apache.org/jira/browse/HDFS-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HDFS-8920: - Attachment: HDFS-8920-HDFS-7285.2.patch Address Kai's comments offline. > Erasure Coding: when recovering lost blocks, logs can be too verbose and hurt > performance > - > > Key: HDFS-8920 > URL: https://issues.apache.org/jira/browse/HDFS-8920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Attachments: HDFS-8920-HDFS-7285.1.patch, HDFS-8920-HDFS-7285.2.patch > > > When we test reading data with datanodes killed, > {{DFSInputStream::getBestNodeDNAddrPair}} becomes a hot spot method and > effectively blocks the client JVM. This log seems too verbose: > {code} > if (chosenNode == null) { > DFSClient.LOG.warn("No live nodes contain block " + block.getBlock() + > " after checking nodes = " + Arrays.toString(nodes) + > ", ignoredNodes = " + ignoredNodes); > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree
[ https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901761#comment-14901761 ] Jing Zhao commented on HDFS-9053: - Thanks for the great work, Yi! So far I just reviewed the B-Tree implementation part and it looks good to me. Just some minor comments: # "static" can be removed {code} public static interface Element extends Comparable { K getKey(); } {code} # The parameter is never used. {code} Node(boolean allocateMaxElements) { elements = new Object[maxElements()]; } {code} # It may be helpful to add some more Preconditions/assert check to verify the parameter and internal state. For example, some verification about the index i in the following code. {code} SplitResult split(int i) { E e = (E)elements[i]; Node next = new Node(true); {code} # Optional: in insertElement maybe we can copy elements only once if we need to expand the array. # Rename {{put}} to {{addOrReplace}} to make its semantic more clear? # Need to update the javadoc of {{removeElement}} and {{removeChild}}. # {{SplitResult#element}} and {{SplitResult#node}} can be declared as final. > Support large directories efficiently using B-Tree > -- > > Key: HDFS-9053 > URL: https://issues.apache.org/jira/browse/HDFS-9053 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Critical > Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 > (BTree).patch, HDFS-9053.001.patch > > > This is a long standing issue, we were trying to improve this in the past. > Currently we use an ArrayList for the children under a directory, and the > children are ordered in the list, for insert/delete/search, the time > complexity is O(log n), but insertion/deleting causes re-allocations and > copies of big arrays, so the operations are costly. For example, if the > children grow to 1M size, the ArrayList will resize to > 1M capacity, so need > > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS > cluster where namenode heap memory is already highly used. I recap the 3 > main issues: > # Insertion/deletion operations in large directories are expensive because > re-allocations and copies of big arrays. > # Dynamically allocate several MB continuous heap memory which will be > long-lived can easily cause full GC problem. > # Even most children are removed later, but the directory INode still > occupies same size heap memory, since the ArrayList will never shrink. > This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to > solve the problem suggested by [~shv]. > So the target of this JIRA is to implement a low memory footprint B-Tree and > use it to replace ArrayList. > If the elements size is not large (less than the maximum degree of B-Tree > node), the B-Tree only has one root node which contains an array for the > elements. And if the size grows large enough, it will split automatically, > and if elements are removed, then B-Tree nodes can merge automatically (see > more: https://en.wikipedia.org/wiki/B-tree). It will solve the above 3 > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901756#comment-14901756 ] Hadoop QA commented on HDFS-9111: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 7s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 8s | The applied patch generated 160 new checkstyle issues (total was 40, now 200). | | {color:green}+1{color} | whitespace | 6m 55s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 29s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 171m 7s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 29s | Tests passed in hadoop-hdfs-client. | | | | 227m 24s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFSNamesystem | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.web.TestWebHDFSOAuth2 | | Timed out tests | org.apache.hadoop.hdfs.tools.TestDFSZKFailoverController | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761468/HDFS-9111.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b00392d | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12575/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12575/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12575/console | This message was automatically generated. > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9026) Support for include/exclude lists on IPv6 setup
[ https://issues.apache.org/jira/browse/HDFS-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901745#comment-14901745 ] Hadoop QA commented on HDFS-9026: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 57s | Findbugs (version ) appears to be broken on HADOOP-11890. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 10m 6s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 44s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 45s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 42s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 154m 1s | Tests failed in hadoop-hdfs. | | | | 204m 20s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestWriteRead | | | hadoop.hdfs.TestHFlush | | | hadoop.security.TestPermission | | | hadoop.hdfs.TestParallelRead | | | hadoop.fs.viewfs.TestViewFsHdfs | | | hadoop.hdfs.TestMiniDFSCluster | | | hadoop.hdfs.TestWriteConfigurationToDFS | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.TestDFSRollback | | | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.TestDatanodeConfig | | | hadoop.fs.TestWebHdfsFileContextMainOperations | | | hadoop.fs.TestGlobPaths | | | hadoop.hdfs.TestDFSShell | | | hadoop.fs.loadGenerator.TestLoadGenerator | | | hadoop.hdfs.TestCrcCorruption | | | hadoop.fs.contract.hdfs.TestHDFSContractMkdir | | | hadoop.hdfs.TestAbandonBlock | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary | | | hadoop.hdfs.TestReadWhileWriting | | | hadoop.fs.viewfs.TestViewFileSystemWithAcls | | | hadoop.fs.contract.hdfs.TestHDFSContractConcat | | | hadoop.fs.TestSymlinkHdfsDisable | | | hadoop.fs.contract.hdfs.TestHDFSContractRootDirectory | | | hadoop.hdfs.TestMissingBlocksAlert | | | hadoop.hdfs.TestBlocksScheduledCounter | | | hadoop.hdfs.TestSmallBlock | | | hadoop.cli.TestDeleteCLI | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.fs.viewfs.TestViewFsWithXAttrs | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.web.TestWebHDFSForHA | | | hadoop.fs.viewfs.TestViewFsDefaultValue | | | hadoop.fs.contract.hdfs.TestHDFSContractOpen | | | hadoop.hdfs.TestFSInputChecker | | | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter | | | hadoop.fs.contract.hdfs.TestHDFSContractRename | | | hadoop.hdfs.TestRemoteBlockReader | | | hadoop.hdfs.TestBlockStoragePolicy | | | hadoop.fs.viewfs.TestViewFsAtHdfsRoot | | | hadoop.hdfs.TestBlockReaderLocal | | | hadoop.fs.contract.hdfs.TestHDFSContractGetFileStatus | | | hadoop.cli.TestCryptoAdminCLI | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr | | | hadoop.hdfs.tools.TestDebugAdmin | | | hadoop.security.TestRefreshUserMappings | | | hadoop.hdfs.TestLargeBlock | | | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs | | | hadoop.hdfs.TestListFilesInFileContext | | | hadoop.fs.TestFcHdfsSetUMask | | | hadoop.hdfs.TestDatanodeReport | | | hadoop.hdfs.TestFileAppend2 | | | hadoop.fs.shell.TestHdfsTextCommand | | | hadoop.hdfs.TestFsShellPermission | | | hadoop.TestGenericRefresh | | | hadoop.fs.TestSymlinkHdfsFileSystem | | | hadoop.hdfs.TestGetBlocks | | | hadoop.fs.contract.hdfs.TestHDFSContractAppend | | | hadoop.fs.contract.hdfs.TestHDFSContractDelete | | | hadoop.hdfs.web.TestWebHdfsTokens | | | hadoop.hdfs.TestEncryptionZonesWithKMS | | | hadoop.hdfs.TestClientReportBadBlock | | | hadoop.cli.TestHDFSCLI | | | hadoop.fs.TestSWebHdfsFileContextMainOperations | | | hadoop.hdfs.TestRestartDFS | | | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement | | | hadoop.hdfs.TestSetTimes | | | hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot | | | hadoop.fs.contract.hdfs.TestHDFSContractSeek | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.fs.viewfs.TestViewFsWithAcls | | | hadoop.hdfs.TestLease | | | hadoop.hdfs.TestDFSUpgrade | |
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901722#comment-14901722 ] Haohui Mai commented on HDFS-9117: -- I suggest bringing in RapidXML (http://rapidxml.sourceforge.net/) to parse the configurations and convert the XML to the {{Options}} object. > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8663) sys cpu usage high on namenode server
[ https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HDFS-8663: - Assignee: (was: Eugene Koifman) > sys cpu usage high on namenode server > - > > Key: HDFS-8663 > URL: https://issues.apache.org/jira/browse/HDFS-8663 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, namenode >Affects Versions: 2.3.0 > Environment: hadoop 2.3.0 centos5.8 >Reporter: tangjunjie > > sys cpu usage high on namenode server lead to run job very slow. > I use ps -elf see many zombie process. > I check hdfs log I found many exceptions like: > org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user > at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) > at org.apache.hadoop.util.Shell.run(Shell.java:418) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52) > at org.apache.hadoop.security.Groups.getGroups(Groups.java:139) > at > org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > Then I create all user such as sem_410 appear in exception.Then the sys cpu > usage on namenode down. > BTW, my hadoop 2.3.0 enaable hadoop acl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8663) sys cpu usage high on namenode server
[ https://issues.apache.org/jira/browse/HDFS-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tangjunjie reassigned HDFS-8663: Assignee: Eugene Koifman > sys cpu usage high on namenode server > - > > Key: HDFS-8663 > URL: https://issues.apache.org/jira/browse/HDFS-8663 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, namenode >Affects Versions: 2.3.0 > Environment: hadoop 2.3.0 centos5.8 >Reporter: tangjunjie >Assignee: Eugene Koifman > > sys cpu usage high on namenode server lead to run job very slow. > I use ps -elf see many zombie process. > I check hdfs log I found many exceptions like: > org.apache.hadoop.util.Shell$ExitCodeException: id: sem_410: No such user > at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) > at org.apache.hadoop.util.Shell.run(Shell.java:418) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:83) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:52) > at org.apache.hadoop.security.Groups.getGroups(Groups.java:139) > at > org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1409) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:81) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getPermissionChecker(FSNamesystem.java:3310) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3491) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > Then I create all user such as sem_410 appear in exception.Then the sys cpu > usage on namenode down. > BTW, my hadoop 2.3.0 enaable hadoop acl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity
[ https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901678#comment-14901678 ] Tsz Wo Nicholas Sze commented on HDFS-8287: --- > ... moving DoubleCellBuffer and CellBuffers out of DFSStripedOutputStream > should be done with separate JIRA, ... Sounds good. Some comments on the patch: {code} +if (submittedParityGenTask) { + try { +// Wait for parity gen task for previout cell. +Future ret = completionService.take(); +ByteBuffer[] encoded = ret.get(); +for (int i = numDataBlocks; i < numAllBlocks; i++) { + writeParity(i, encoded[i], doubleCellBuffer.getReadyBuf().getChecksumArray(i)); +} + } catch (InterruptedException e) { +LOG.warn("Caught InterruptedException: ", e); + } catch (ExecutionException e) { +LOG.warn("Caught ExecutionException: ", e); + } {code} - The caught exception should be re-thrown as an IOException. - Typo: "previout" should be "previous". > DFSStripedOutputStream.writeChunk should not wait for writing parity > - > > Key: HDFS-8287 > URL: https://issues.apache.org/jira/browse/HDFS-8287 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Kai Sasaki > Attachments: HDFS-8287-HDFS-7285.00.patch, > HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, > HDFS-8287-HDFS-7285.03.patch, HDFS-8287-HDFS-7285.04.patch, > HDFS-8287-HDFS-7285.05.patch, HDFS-8287-HDFS-7285.06.patch, > HDFS-8287-HDFS-7285.07.patch, HDFS-8287-HDFS-7285.08.patch, > HDFS-8287-HDFS-7285.09.patch, HDFS-8287-HDFS-7285.10.patch, > HDFS-8287-HDFS-7285.WIP.patch, HDFS-8287-performance-report.pdf, > h8287_20150911.patch, jstack-dump.txt > > > When a stripping cell is full, writeChunk computes and generates parity > packets. It sequentially calls waitAndQueuePacket so that user client cannot > continue to write data until it finishes. > We should allow user client to continue writing instead but not blocking it > when writing parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC
[ https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901673#comment-14901673 ] Hadoop QA commented on HDFS-9107: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 21s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 25s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 22s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 16s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 198m 6s | Tests failed in hadoop-hdfs. | | | | 244m 36s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761485/HDFS-9107.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b00392d | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12574/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12574/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12574/console | This message was automatically generated. > Prevent NN's unrecoverable death spiral after full GC > - > > Key: HDFS-9107 > URL: https://issues.apache.org/jira/browse/HDFS-9107 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-9107.patch, HDFS-9107.patch > > > A full GC pause in the NN that exceeds the dead node interval can lead to an > infinite cycle of full GCs. The most common situation that precipitates an > unrecoverable state is a network issue that temporarily cuts off multiple > racks. > The NN wakes up and falsely starts marking nodes dead. This bloats the > replication queues which increases memory pressure. The replications create a > flurry of incremental block reports and a glut of over-replicated blocks. > The "dead" nodes heartbeat within seconds. The NN forces a re-registration > which requires a full block report - more memory pressure. The NN now has to > invalidate all the over-replicated blocks. The extra blocks are added to > invalidation queues, tracked in an excess blocks map, etc - much more memory > pressure. > All the memory pressure can push the NN into another full GC which repeats > the entire cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-9112: --- Attachment: HDFS-9112.002.patch Based on [~jingzhao] comments, this change makes the error message more explicit. It tells the user to pass -ns if needed. As for the test failures for the patch 1, that does not seem related to the patch > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch, HDFS-9112.002.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901650#comment-14901650 ] Hadoop QA commented on HDFS-9112: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 55s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 8s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 24s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 48s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 24m 29s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 197m 36s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 45s | Tests passed in hadoop-hdfs-client. | | | | 276m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761473/HDFS-9112.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b00392d | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12573/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12573/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12573/console | This message was automatically generated. > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client
[ https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901642#comment-14901642 ] Tsz Wo Nicholas Sze commented on HDFS-7858: --- > ... then those clients might not get a response soon enough to try the other > NN. [~asuresh], do you recall how long have you seen for the client waiting? I might hit a similar problem recently. > Improve HA Namenode Failover detection on the client > > > Key: HDFS-7858 > URL: https://issues.apache.org/jira/browse/HDFS-7858 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Arun Suresh >Assignee: Arun Suresh > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: HDFS-7858.1.patch, HDFS-7858.10.patch, > HDFS-7858.10.patch, HDFS-7858.11.patch, HDFS-7858.12.patch, > HDFS-7858.13.patch, HDFS-7858.2.patch, HDFS-7858.2.patch, HDFS-7858.3.patch, > HDFS-7858.4.patch, HDFS-7858.5.patch, HDFS-7858.6.patch, HDFS-7858.7.patch, > HDFS-7858.8.patch, HDFS-7858.9.patch > > > In an HA deployment, Clients are configured with the hostnames of both the > Active and Standby Namenodes.Clients will first try one of the NNs > (non-deterministically) and if its a standby NN, then it will respond to the > client to retry the request on the other Namenode. > If the client happens to talks to the Standby first, and the standby is > undergoing some GC / is busy, then those clients might not get a response > soon enough to try the other NN. > Proposed Approach to solve this : > 1) Use hedged RPCs to simultaneously call multiple configured NNs to decide > which is the active Namenode. > 2) Subsequent calls, will invoke the previously successful NN. > 3) On failover of the currently active NN, the remaining NNs will be invoked > to decide which is the new active -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++
[ https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901630#comment-14901630 ] Haohui Mai commented on HDFS-9118: -- The interfaces of logging class are quite closed to the one used in snappy and glog. A rational choice is to make it an abstract class and allow users to specify the instance in the {{Options}} instance. > Add logging system for libdhfs++ > > > Key: HDFS-9118 > URL: https://issues.apache.org/jira/browse/HDFS-9118 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen > > With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the > library are going to have their own logging infrastructure that we're going > to want to provide data to. > libhdfs++ should have a logging library that: > * Is overridable and can provide sufficient information to work well with > common C++ logging frameworks > * Has a rational default implementation > * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests
[ https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9116: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-8707 Target Version/s: HDFS-8707 Status: Resolved (was: Patch Available) Committed to the HDFS-8707 branch. Thanks James for the reviews. > Suppress false positives from Valgrind on uninitialized variables in tests > -- > > Key: HDFS-9116 > URL: https://issues.apache.org/jira/browse/HDFS-9116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Fix For: HDFS-8707 > > Attachments: HDFS-9116.000.patch > > > Valgrind complains about uninitialized variables in the unit tests. It should > be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9103) Retry reads on DN failure
[ https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901614#comment-14901614 ] Haohui Mai commented on HDFS-9103: -- There are definitely use cases that need full flexible APIs (rigorous testings are one of those). However it's great to build an easy version of APIs on top of that. Speaking of the patch itself {{AsyncPreadSome}} needs to be completely stateless. The name {{InputStream}} might be a little bit confusing now, but I don't think it is a good idea to put this functionality there, as least for now. > Retry reads on DN failure > - > > Key: HDFS-9103 > URL: https://issues.apache.org/jira/browse/HDFS-9103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > Fix For: HDFS-8707 > > Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch > > > When AsyncPreadSome fails, add the failed DataNode to the excluded list and > try again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901613#comment-14901613 ] Jing Zhao commented on HDFS-9112: - Thanks for the clarification, [~anu]! I think for admin or other clients it's not necessary for them to clearly distinguish internal/external name services. The internal/external ns makes sense maybe only to DataNodes. Thus I'm currently leaning towards requiring admins to explicitly specify the name service using "-ns" option. But I completely agree with you that we should improve the error message. > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups
[ https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9109: Attachment: HDFS-9109.02.patch > dfs.datanode.dns.interface does not work with hosts file based setups > - > > Key: HDFS-9109 > URL: https://issues.apache.org/jira/browse/HDFS-9109 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9109.01.patch, HDFS-9109.02.patch > > > The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode > select its hostname by doing a reverse lookup of IP addresses on the specific > network interface. This does not work {{when /etc/hosts}} is used to setup > alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901601#comment-14901601 ] Haohui Mai commented on HDFS-9108: -- I didn't check the assembly, but I'm surprised that running the {{inputstream_test}} under valgrind fails to uncover the problem. > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, > HDFS-9108.000.patch > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901599#comment-14901599 ] Hadoop QA commented on HDFS-9108: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761517/HDFS-9108.000.patch | | Optional Tests | javac unit | | git revision | trunk / b00392d | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12577/console | This message was automatically generated. > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, > HDFS-9108.000.patch > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901600#comment-14901600 ] Anu Engineer commented on HDFS-9112: [~jingzhao] Thanks for the pointer to the [~dlmarion] 's comments. I see that we had assumed that it is better to let users specify -ns option if they have this kind of HA setup. However it looks like both us and cloudera ran into this issue in the field hence I think we need to have a little more clarity with error messages, with the current code the error message is very cryptic. {code} hdfs haadmin -getServiceState nn2 Illegal argument: Unable to determine the nameservice id. {code} This gives no clue to the user that they are expected to specify -ns option. Also from the comments that you pointed me to I am not able to decipher why it is better to specify "-ns" by the user, when we have that information in the config files. Since I don't have much context on HDFS-6376, I would appreciate if you can provide some rationale (From cursory comment reading it looks to me that Dave originally had exclude settings which created some issues, but [~wheat9] modified them to internal nameservices. If so using internal name services hopefully should not cause a failure.) if you like I can modify this patch to print out an error message which asks user to add -ns option explicitly, instead of reading the name services name from config, that would be a trivial change. Please let me know if you think I should do that or if this change looks good enough. > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901597#comment-14901597 ] Haohui Mai commented on HDFS-9108: -- The root cause is that {{ReadBlockContinuation}} making a copy of a reference instead of the value during template instantiation. The v0 patch fixes the problems and adds a static assert to ensure it won't happen again. > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, > HDFS-9108.000.patch > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9108: - Attachment: HDFS-9108.000.patch > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1, > HDFS-9108.000.patch > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9108: - Attachment: (was: HDFS-9108.000.patch) > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1 > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9108: - Attachment: HDFS-9108.000.patch > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1 > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9108: - Status: Patch Available (was: In Progress) > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1 > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9108) InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9108: - Summary: InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers (was: Pointer to read buffer isn't being passed to recvmsg syscall) > InputStreamImpl::ReadBlockContinuation stores wrong pointers of buffers > --- > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1 > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai reassigned HDFS-9108: Assignee: Haohui Mai (was: James Clampffer) > Pointer to read buffer isn't being passed to recvmsg syscall > > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: Haohui Mai >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1 > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901549#comment-14901549 ] Jing Zhao commented on HDFS-9112: - We had a discussion in HDFS-6376 about this and [~dlmarion]'s point is it's better to require admin to specify the name service id using "-ns" option in haadmin commands in such a complex configuration scenario (please see his comment [here|https://issues.apache.org/jira/browse/HDFS-6376?focusedCommentId=14108157&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14108157]). > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901545#comment-14901545 ] Hadoop QA commented on HDFS-8882: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 7s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 26 new or modified test files. | | {color:green}+1{color} | javac | 8m 14s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 0s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 50s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 41s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 186m 25s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 29s | Tests passed in hadoop-hdfs-client. | | | | 236m 22s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-client | | Failed unit tests | hadoop.hdfs.web.TestWebHDFSOAuth2 | | | hadoop.hdfs.TestWriteStripedFileWithFailure | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761177/HDFS-8882-HDFS-7285-02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / b762199 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-client.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12570/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12570/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12570/console | This message was automatically generated. > Use datablocks, parityblocks and cell size from ErasureCodingPolicy > --- > > Key: HDFS-8882 > URL: https://issues.apache.org/jira/browse/HDFS-8882 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-8882-HDFS-7285-01.patch, > HDFS-8882-HDFS-7285-02.patch > > > As part of earlier development, constants were used for datablocks, parity > blocks and cellsize. > Now all these are available in ec zone. Use from there and stop using > constant values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive
[ https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901534#comment-14901534 ] Zhe Zhang commented on HDFS-9119: - We have a few options to fix the discrepancy: # Shorten the edit log tailing interval from 2 mins to 1 min. # Change the timeout of {{transitionToActive}} to 2 mins. This will allow us to add the logic to support per-RPC timeout configuration. # A more complex solution is to add a {{prepareTransitionToActive}} RPC call. I'm leaning toward solution #1 because it's the simplest, and more frequent edit log tailing (and subsequently, more edit log segments) should be an acceptable behavior. Please let me know if you have any concern on this approach. > Discrepancy between edit log tailing interval and RPC timeout for > transitionToActive > > > Key: HDFS-9119 > URL: https://issues.apache.org/jira/browse/HDFS-9119 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > {{EditLogTailer}} on standby NameNode tails edits from active NameNode every > 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute. > If active NameNode encounters very intensive metadata workload (in > particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files > and directories), the amount of updates accumulated in the 2 mins edit log > tailing interval is hard for the standby NameNode to catch up in the 1 min > timeout window. If that happens, the FailoverController will timeout and give > up trying to transition the standby to active. The old ANN will resume adding > more edits. When the SbNN finally finishes catching up the edits and tries to > become active, it will crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive
Zhe Zhang created HDFS-9119: --- Summary: Discrepancy between edit log tailing interval and RPC timeout for transitionToActive Key: HDFS-9119 URL: https://issues.apache.org/jira/browse/HDFS-9119 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.7.1 Reporter: Zhe Zhang {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute. If active NameNode encounters very intensive metadata workload (in particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files and directories), the amount of updates accumulated in the 2 mins edit log tailing interval is hard for the standby NameNode to catch up in the 1 min timeout window. If that happens, the FailoverController will timeout and give up trying to transition the standby to active. The old ANN will resume adding more edits. When the SbNN finally finishes catching up the edits and tries to become active, it will crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive
[ https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reassigned HDFS-9119: --- Assignee: Zhe Zhang > Discrepancy between edit log tailing interval and RPC timeout for > transitionToActive > > > Key: HDFS-9119 > URL: https://issues.apache.org/jira/browse/HDFS-9119 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > {{EditLogTailer}} on standby NameNode tails edits from active NameNode every > 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute. > If active NameNode encounters very intensive metadata workload (in > particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files > and directories), the amount of updates accumulated in the 2 mins edit log > tailing interval is hard for the standby NameNode to catch up in the 1 min > timeout window. If that happens, the FailoverController will timeout and give > up trying to transition the standby to active. The old ANN will resume adding > more edits. When the SbNN finally finishes catching up the edits and tries to > become active, it will crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9026) Support for include/exclude lists on IPv6 setup
[ https://issues.apache.org/jira/browse/HDFS-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemanja Matkovic updated HDFS-9026: --- Attachment: HDFS-9026-HADOOP-11890.002.patch Rename patch to match branch name. > Support for include/exclude lists on IPv6 setup > --- > > Key: HDFS-9026 > URL: https://issues.apache.org/jira/browse/HDFS-9026 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Environment: This affects only IPv6 cluster setup >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic > Labels: ipv6 > Attachments: HDFS-9026-1.patch, HDFS-9026-2.patch, > HDFS-9026-HADOOP-11890.002.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > This is a tracking item for having e2e IPv6 support in HDFS. > Nate did great ground work in HDFS-8078 but for having whole feature working > e2e this one of the items missing. > Basically today NN won't be able to parse IPv6 addresses if they are present > in include or exclude list. > Patch has a dependency (and has been tested on IPv6 only cluster) on top of > HDFS-8078.14.patch > This should be committed to HADOOP-11890 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures
[ https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901442#comment-14901442 ] Jing Zhao commented on HDFS-9106: - bq. Transfer timeout needs to be different from per-packet timeout. +1 for changing the timeout. bq. if the partial block transfer fails, the write will fail permanently without retrying or continuing with whatever is in the pipeline If the partial block transfer fails, and if {{bestEffort}} is enabled, the current code will still use the remaining datanodes to setup the pipeline? But looks like the {{nodes}} may still include the new DN after the failure though. > Transfer failure during pipeline recovery causes permanent write failures > - > > Key: HDFS-9106 > URL: https://issues.apache.org/jira/browse/HDFS-9106 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-9106-poc.patch > > > When a new node is added to a write pipeline during flush/sync, if the > partial block transfer fails, the write will fail permanently without > retrying or continuing with whatever is in the pipeline. > The transfer often fails in busy clusters due to timeout. There is no > per-packet ACK between client and datanode or between source and target > datanodes. If the total transfer time exceeds the configured timeout + 10 > seconds (2 * 5 seconds slack), it is considered failed. Naturally, the > failure rate is higher with bigger block sizes. > I propose following changes: > - Transfer timeout needs to be different from per-packet timeout. > - transfer should be retried if fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4981) chmod 777 the .snapshot directory does not error that modification on RO snapshot is disallowed
[ https://issues.apache.org/jira/browse/HDFS-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901423#comment-14901423 ] Xiao Chen commented on HDFS-4981: - Hi Stephen, After some investigation, the root cause is that {{FsShellPermissions#processPath}} inside common has the optimization that if new permission is the same as current, no further checking is done. (The 'Modification on a read-only snapshot is disallowed' message is from {{FSDirectory#getINodesInPath4Write}} inside hdfs. At this point, the most reasonable enhancement I can think of is to add a special check for the .snapshot dir in FsShellPermissions. However, considering 1. Since the perm check is ignored, no action is taken. The only thing missing is the error message. 2. The possible fix is located in common where snapshot feature should not be exposed 3. According to the design in HDFS-2802, RW snapshots may be supported in the future. In this case we have to revert the check outside (or at least change the message) I suggest not to add the fix for now. Please let me know if you have any suggestions/feedback. Thanks. > chmod 777 the .snapshot directory does not error that modification on RO > snapshot is disallowed > --- > > Key: HDFS-4981 > URL: https://issues.apache.org/jira/browse/HDFS-4981 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0, 2.0.4-alpha >Reporter: Stephen Chu >Assignee: Xiao Chen >Priority: Trivial > > Snapshots currently are RO, so it's expected that when someone tries to > modify the .snapshot directory s/he is denied. > However, if the user tries to chmod 777 the .snapshot directory, the > operation does not error. The user should be alerted that modifications are > not allowed, even if this operation didn't actually change anything. > Using other modes will trigger the error, though. > {code} > [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 777 > /user/schu/test_dir_1/.snapshot/ > [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 755 > /user/schu/test_dir_1/.snapshot/ > chmod: changing permissions of '/user/schu/test_dir_1/.snapshot': > Modification on a read-only snapshot is disallowed > [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chmod 435 > /user/schu/test_dir_1/.snapshot/ > chmod: changing permissions of '/user/schu/test_dir_1/.snapshot': > Modification on a read-only snapshot is disallowed > [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chown hdfs > /user/schu/test_dir_1/.snapshot/ > chown: changing ownership of '/user/schu/test_dir_1/.snapshot': Modification > on a read-only snapshot is disallowed > [schu@hdfs-snapshots-1 hdfs]$ sudo -u hdfs hdfs dfs -chown schu > /user/schu/test_dir_1/.snapshot/ > chown: changing ownership of '/user/schu/test_dir_1/.snapshot': Modification > on a read-only snapshot is disallowed > [schu@hdfs-snapshots-1 hdfs]$ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901424#comment-14901424 ] Hadoop QA commented on HDFS-9111: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 10s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 14s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 14s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 6s | The applied patch generated 160 new checkstyle issues (total was 41, now 201). | | {color:green}+1{color} | whitespace | 4m 11s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 20s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 96m 44s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 30s | Tests passed in hadoop-hdfs-client. | | | | 150m 44s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestHDFSConcat | | | hadoop.hdfs.server.blockmanagement.TestSequentialBlockId | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler | | | hadoop.hdfs.server.namenode.TestMetadataVersionOutput | | | hadoop.hdfs.server.namenode.TestFSNamesystemMBean | | | hadoop.hdfs.TestDFSInputStream | | | hadoop.hdfs.TestSeekBug | | | hadoop.hdfs.TestFileAppendRestart | | | hadoop.hdfs.server.namenode.TestNameNodeXAttr | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica | | | hadoop.TestRefreshCallQueue | | | hadoop.hdfs.TestGetBlocks | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling | | | hadoop.hdfs.qjournal.client.TestQJMWithFaults | | | hadoop.hdfs.server.datanode.TestHSync | | | hadoop.hdfs.server.namenode.TestStartup | | | hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer | | | hadoop.hdfs.server.datanode.TestDeleteBlockPool | | | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock | | | hadoop.hdfs.TestRollingUpgradeRollback | | | hadoop.hdfs.TestDatanodeDeath | | | hadoop.hdfs.server.namenode.ha.TestHAMetrics | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.server.namenode.TestFileLimit | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing | | | hadoop.hdfs.qjournal.client.TestQuorumJournalManagerUnit | | | hadoop.hdfs.TestParallelShortCircuitLegacyRead | | | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes | | | hadoop.hdfs.server.datanode.TestBlockRecovery | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.TestFileAppend4 | | | hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks | | | hadoop.hdfs.server.namenode.TestStorageRestore | | | hadoop.hdfs.server.blockmanagement.TestPendingReplication | | | hadoop.hdfs.server.namenode.TestSecurityTokenEditLog | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.server.namenode.TestAclConfigFlag | | | hadoop.hdfs.TestReservedRawPaths | | | hadoop.hdfs.TestExternalBlockReader | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | | | hadoop.hdfs.server.datanode.TestIncrementalBlockReports | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory | | | hadoop.tracing.TestTracingShortCircuitLocalRead | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade | | | hadoop.hdfs.server.namenode.TestAddBlockRetry | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.Te
[jira] [Commented] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups
[ https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901402#comment-14901402 ] Hadoop QA commented on HDFS-9109: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 29s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 22s | The applied patch generated 1 new checkstyle issues (total was 60, now 60). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 49s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 25m 50s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 95m 8s | Tests failed in hadoop-hdfs. | | | | 174m 58s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.fs.shell.find.TestFind | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.web.TestWebHDFSOAuth2 | | Timed out tests | org.apache.hadoop.hdfs.TestFileCorruption | | | org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints | | | org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions | | | org.apache.hadoop.hdfs.server.namenode.TestFsck | | | org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761465/HDFS-9109.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c9cb6a5 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12569/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12569/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12569/console | This message was automatically generated. > dfs.datanode.dns.interface does not work with hosts file based setups > - > > Key: HDFS-9109 > URL: https://issues.apache.org/jira/browse/HDFS-9109 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9109.01.patch > > > The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode > select its hostname by doing a reverse lookup of IP addresses on the specific > network interface. This does not work {{when /etc/hosts}} is used to setup > alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9091) Erasure Coding: Provide DistributedFilesystem API to getAllErasureCodingPolicies
[ https://issues.apache.org/jira/browse/HDFS-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9091: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Status: Resolved (was: Patch Available) Thanks Rakesh for the work! +1 on the patch. I just committed it to the feature branch. > Erasure Coding: Provide DistributedFilesystem API to > getAllErasureCodingPolicies > > > Key: HDFS-9091 > URL: https://issues.apache.org/jira/browse/HDFS-9091 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: HDFS-7285 > > Attachments: HDFS-9091-HDFS-7285-00.patch > > > This jira is to implement {{DFS#getAllErasureCodingPolicies()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC
[ https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-9107: -- Attachment: HDFS-9107.patch Use a stopwatch to abort processing in the inner heartbeat checking loop, and then check at end of the entire scan for whether to skip next scan. Even added a meager test. > Prevent NN's unrecoverable death spiral after full GC > - > > Key: HDFS-9107 > URL: https://issues.apache.org/jira/browse/HDFS-9107 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-9107.patch, HDFS-9107.patch > > > A full GC pause in the NN that exceeds the dead node interval can lead to an > infinite cycle of full GCs. The most common situation that precipitates an > unrecoverable state is a network issue that temporarily cuts off multiple > racks. > The NN wakes up and falsely starts marking nodes dead. This bloats the > replication queues which increases memory pressure. The replications create a > flurry of incremental block reports and a glut of over-replicated blocks. > The "dead" nodes heartbeat within seconds. The NN forces a re-registration > which requires a full block report - more memory pressure. The NN now has to > invalidate all the over-replicated blocks. The extra blocks are added to > invalidation queues, tracked in an excess blocks map, etc - much more memory > pressure. > All the memory pressure can push the NN into another full GC which repeats > the entire cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901306#comment-14901306 ] Zhe Zhang commented on HDFS-8882: - Thanks Vinay for the patch. It looks good overall. A couple of comments: # Should we use {{FSDirErasureCodingOp.getErasureCodingPolicy(fsn, src)}} instead? A side note is that the multiple {{getErasureCodingPolicy}} methods are a little confusing. We should clean them up as a follow-on. {code} // FSDirWriteFileOp + INodesInPath iip = fsn.dir.getINodesInPath4Write(src, false); + ecPolicy = FSDirErasureCodingOp.getErasureCodingPolicy(fsn, iip); {code} # It would be nice to copy over the Javadoc and comments on the constants from {{HdfsConstants}} to {{StripedFileTestUtil}}. > Use datablocks, parityblocks and cell size from ErasureCodingPolicy > --- > > Key: HDFS-8882 > URL: https://issues.apache.org/jira/browse/HDFS-8882 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-8882-HDFS-7285-01.patch, > HDFS-8882-HDFS-7285-02.patch > > > As part of earlier development, constants were used for datablocks, parity > blocks and cellsize. > Now all these are available in ec zone. Use from there and stop using > constant values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9118) Add logging system for libdhfs++
Bob Hansen created HDFS-9118: Summary: Add logging system for libdhfs++ Key: HDFS-9118 URL: https://issues.apache.org/jira/browse/HDFS-9118 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-8707 Reporter: Bob Hansen With HDFS-9505, we've starting logging data from libhdfs++. Consumers of the library are going to have their own logging infrastructure that we're going to want to provide data to. libhdfs++ should have a logging library that: * Is overridable and can provide sufficient information to work well with common C++ logging frameworks * Has a rational default implementation * Is performant -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901293#comment-14901293 ] James Clampffer commented on HDFS-9095: --- Agree with bob about making the CMakeLists as robust as possible, otherwise +1 on the patch. Getting in the basics for logging is very nice as well. Re: In RpcConnection methods, should we be calling into the handler while holding the lock on the engine state? If any method there does synchronous I/O or hangs for any reason, the whole Rpc system locks up. This was done to avoid using a std::recursive_mutex because right now that handler only gets called from OnRecvCompleted. I don't think the handler is going to be changing much unless we start using multiple connections from a single RpcEngine. Lock contention is one of the things I hope to start profiling soon; if the overhead is negligible I'll switch that back to a recursive_mutex and grab the lock in the handler as well (I'll file a jira if that's the case). > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8873) throttle directoryScanner
[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901285#comment-14901285 ] Colin Patrick McCabe commented on HDFS-8873: [~nroberts], I agree that it might be better to keep the old behavior of finishing one volume in a thread before moving on to the next. It might increase our cache hit rate. I can think of reasons to do the opposite (i.e. spread the load across disks), that might motivate us to add that mode as an option, but it seems better to focus on just throttling in this change. > throttle directoryScanner > - > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Daniel Templeton > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, > HDFS-8873.003.patch, HDFS-8873.004.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections
[ https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901284#comment-14901284 ] Bob Hansen commented on HDFS-8855: -- There are two separable issues; this is a performance bug in existing deployments, and your comment is a good outline for a new and improved architecture. HDFS-7966 and the rest of your proposal could be a very good solution in future versions, but doesn't obviate the performance issue with deployed systems, nor does it answer the current use case of having a bog-simple path to get hdfs data via a "curl -L http:/" call. > Webhdfs client leaks active NameNode connections > > > Key: HDFS-8855 > URL: https://issues.apache.org/jira/browse/HDFS-8855 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Bob Hansen >Assignee: Xiaobing Zhou > Attachments: HDFS-8855.005.patch, HDFS-8855.1.patch, > HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, > HDFS_8855.prototype.patch > > > The attached script simulates a process opening ~50 files via webhdfs and > performing random reads. Note that there are at most 50 concurrent reads, > and all webhdfs sessions are kept open. Each read is ~64k at a random > position. > The script periodically (once per second) shells into the NameNode and > produces a summary of the socket states. For my test cluster with 5 nodes, > it took ~30 seconds for the NameNode to have ~25000 active connections and > fails. > It appears that each request to the webhdfs client is opening a new > connection to the NameNode and keeping it open after the request is complete. > If the process continues to run, eventually (~30-60 seconds), all of the > open connections are closed and the NameNode recovers. > This smells like SoftReference reaping. Are we using SoftReferences in the > webhdfs client to cache NameNode connections but never re-using them? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes
[ https://issues.apache.org/jira/browse/HDFS-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901250#comment-14901250 ] Andrew Wang commented on HDFS-8632: --- Private APIs don't need stability annotations, we're free to change anything private as long as it doesn't break public interfaces. So private interfaces are all "unstable" in that sense. Also since anything not marked Public is Private, adding Private annotations everywhere is, strictly speaking, not necessary. It's a good habit though :) Overall though looks good, thanks for working on this Rakesh! > Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes > -- > > Key: HDFS-8632 > URL: https://issues.apache.org/jira/browse/HDFS-8632 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-8632-HDFS-7285-00.patch, > HDFS-8632-HDFS-7285-01.patch, HDFS-8632-HDFS-7285-02.patch, > HDFS-8632-HDFS-7285-03.patch > > > I've noticed some of the erasure coding classes missing > {{@InterfaceAudience}} annotation. It would be good to identify the classes > and add proper annotation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9095) RPC client should fail gracefully when the connection is timed out or reset
[ https://issues.apache.org/jira/browse/HDFS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901244#comment-14901244 ] Bob Hansen commented on HDFS-9095: -- Re: CMAKE_CURRENT_LIST_DIR vs. CMAKE_CURRENT_SRC_DIR: According to ye olde [StackOverflow|http://stackoverflow.com/questions/15662497/in-cmake-what-is-the-difference-between-cmake-current-source-dir-and-cmake-curr], it becomes more of an issue when files are included across directories (as some of the protobuf stuff is). The difference is what led to hours of angst in HDFS-9025 where the cwd was under the CMakeLists.txt. It's not a super-big deal, but once bitten, twice shy. Re: Options - what you have here is a good start; we can discuss an architectural solution under HDFS-9117. > RPC client should fail gracefully when the connection is timed out or reset > --- > > Key: HDFS-9095 > URL: https://issues.apache.org/jira/browse/HDFS-9095 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9095.000.patch > > > The RPC client should fail gracefully when the connection is timed out or > reset. instead of bailing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9117) Config file reader / options classes for libhdfs++
Bob Hansen created HDFS-9117: Summary: Config file reader / options classes for libhdfs++ Key: HDFS-9117 URL: https://issues.apache.org/jira/browse/HDFS-9117 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-8707 Reporter: Bob Hansen For environmental compatability with HDFS installations, libhdfs++ should be able to read the configurations from Hadoop XML files and behave in line with the Java implementation. Most notably, machine names and ports should be readable from Hadoop XML configuration files. Similarly, an internal Options architecture for libhdfs++ should be developed to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9103) Retry reads on DN failure
[ https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901213#comment-14901213 ] James Clampffer commented on HDFS-9103: --- I agree with Bob that the C++ API should be reasonably usable on its own; it might not be tuned perfectly but that could be added incrementally on the user side. We could supply callback(s) to handle different sorts of failures later, something like InputStream::onDroppedConnection. The InputStream will still handle the failure it but it allows a user to see what's going on. Just to be safe it might be worth wrapping previously_excluded_datanodes in a lock. Or we should agree on threading semantics that say it doesn't need one there. Otherwise +1 > Retry reads on DN failure > - > > Key: HDFS-9103 > URL: https://issues.apache.org/jira/browse/HDFS-9103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > Fix For: HDFS-8707 > > Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch > > > When AsyncPreadSome fails, add the failed DataNode to the excluded list and > try again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8882) Use datablocks, parityblocks and cell size from ErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8882: Summary: Use datablocks, parityblocks and cell size from ErasureCodingPolicy (was: Use datablocks, parityblocks and cell size from ec zone) > Use datablocks, parityblocks and cell size from ErasureCodingPolicy > --- > > Key: HDFS-8882 > URL: https://issues.apache.org/jira/browse/HDFS-8882 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7285 >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HDFS-8882-HDFS-7285-01.patch, > HDFS-8882-HDFS-7285-02.patch > > > As part of earlier development, constants were used for datablocks, parity > blocks and cellsize. > Now all these are available in ec zone. Use from there and stop using > constant values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests
[ https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901201#comment-14901201 ] James Clampffer commented on HDFS-9116: --- +1 > Suppress false positives from Valgrind on uninitialized variables in tests > -- > > Key: HDFS-9116 > URL: https://issues.apache.org/jira/browse/HDFS-9116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Attachments: HDFS-9116.000.patch > > > Valgrind complains about uninitialized variables in the unit tests. It should > be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-9112: --- Status: Patch Available (was: Open) > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections
[ https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901200#comment-14901200 ] Haohui Mai commented on HDFS-8855: -- Revisiting the use case -- how much benefits are we getting from the cache? Is making a connection from DN to NN necessary at all? There are two issues that we have experienced in production here: * DN creates too many connections to the NN when serving WebHDFS requests. It happens when doing distcp over webhdfs in a large cluster (~4,000 nodes) * There are a lot of TIME_WAIT connections when DN serves a large mount of concurrent, burst reads. The application sees high variances of latency when there are a lot of TIME_WAIT connections on the NN. The current workflow is the following: 1. NN generates a 307 to redirect the client to the DN that is closet to the client 2. DN receives the request from the client. It creates a new {{DFSClient}}, connects to the NN and creates a {{DFSInputStream}} 3. It streams the {{DFSInputStream}} to the client as HTTP streams My argument argument is that steps (2) and (3) are unnecessary if the DN exposes a {{GET_BLOCK}} call that directly streams the contents of the block. The problem is eliminated at the very beginning. My proposal are: 1. Expose a {{GET_BLOCK}} call in the current DN to return the content of a block on the DN. 2. Create a {{WebBlockReader}} that reads the block from {{GET_BLOCK}} 3. {{WebHdfsFileSystem}} can use both {{GET_BLOCK_LOCATIONS}} and the {{GET_BLOCK}} to serve the data. >From an implementation prospective, there are implementation in the HDFS-7966 >branch for (1) already. It is straightforward to implement (2) (it's just a >HTTP GET). And (3) can be done by augmenting the responses of >{{GET_BLOCK_LOCATIONS}} on whether the DN supports the {{GET_BLOCK}} call. Thoughts? > Webhdfs client leaks active NameNode connections > > > Key: HDFS-8855 > URL: https://issues.apache.org/jira/browse/HDFS-8855 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Bob Hansen >Assignee: Xiaobing Zhou > Attachments: HDFS-8855.005.patch, HDFS-8855.1.patch, > HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, > HDFS_8855.prototype.patch > > > The attached script simulates a process opening ~50 files via webhdfs and > performing random reads. Note that there are at most 50 concurrent reads, > and all webhdfs sessions are kept open. Each read is ~64k at a random > position. > The script periodically (once per second) shells into the NameNode and > produces a summary of the socket states. For my test cluster with 5 nodes, > it took ~30 seconds for the NameNode to have ~25000 active connections and > fails. > It appears that each request to the webhdfs client is opening a new > connection to the NameNode and keeping it open after the request is complete. > If the process continues to run, eventually (~30-60 seconds), all of the > open connections are closed and the NameNode recovers. > This smells like SoftReference reaping. Are we using SoftReferences in the > webhdfs client to cache NameNode connections but never re-using them? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9112) Haadmin fails if multiple name service IDs are configured
[ https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-9112: --- Attachment: HDFS-9112.001.patch [~atm] Thanks for letting me know. [~templedf] I would appreciate if you can take a look at this patch. This patch fixes getNamenodeServiceAddr by looking at dfs.internal.nameservices and choosing the right name if we have more than one name entry in dfs.nameservices. Along with Unit tests, manually verified that haadmin command is now able to locate nameserver URI if we have the setup described in HDFS-6376 > Haadmin fails if multiple name service IDs are configured > - > > Key: HDFS-9112 > URL: https://issues.apache.org/jira/browse/HDFS-9112 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9112.001.patch > > > In HDFS-6376 we supported a feature for distcp that allows multiple > NameService IDs to be specified so that we can copy from two HA enabled > clusters. > That confuses haadmin command since we have a check in > DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in > that property. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9108) Pointer to read buffer isn't being passed to recvmsg syscall
[ https://issues.apache.org/jira/browse/HDFS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9108: -- Priority: Blocker (was: Major) > Pointer to read buffer isn't being passed to recvmsg syscall > > > Key: HDFS-9108 > URL: https://issues.apache.org/jira/browse/HDFS-9108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Environment: Ubuntu x86_64, gcc 4.8.2 >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Blocker > Attachments: 9108-async-repro.patch, 9108-async-repro.patch1 > > > Somewhere between InputStream->PositionRead and the asio code the pointer to > the destination buffer gets lost. PositionRead will correctly return the > number of bytes read but the buffer won't be filled. > This only seems to effect the remote_block_reader, RPC calls are working. > Valgrind error: > Syscall param recvmsg(msg.msg_iov) points to uninitialised byte(s) > msg.msg_iov[0] should equal the buffer pointer passed to PositionRead > Hit when using a promise to make the async call block until completion. > auto stat = std::make_shared>(); > std::future future(stat->get_future()); > size_t readCount = 0; > auto h = [stat, &readCount,buf](const Status &s, size_t bytes) { > stat->set_value(s); > readCount = bytes; > }; > char buf[50]; > inputStream->PositionRead(buf, 50, 0, h); > > //wait for async to finish > future.get(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
[ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-9040: --- Assignee: Jing Zhao > Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests > to Coordinator) > --- > > Key: HDFS-9040 > URL: https://issues.apache.org/jira/browse/HDFS-9040 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su >Assignee: Jing Zhao > Attachments: HDFS-9040-HDFS-7285.002.patch, > HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, > HDFS-9040.02.bgstreamer.patch > > > The general idea is to simplify error handling logic. > Proposal 1: > A BlockGroupDataStreamer to communicate with NN to allocate/update block, and > StripedDataStreamer s only have to stream blocks to DNs. > Proposal 2: > See below the > [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388] > from [~jingzhao]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
[ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901197#comment-14901197 ] Jing Zhao commented on HDFS-9040: - bq. In short, bumpGS is useful for choosing working set(healthy replicas). It's not useful for calculating safe length with given working set. (I think Jing Zhao just said that if I understand correctly.) bq. I agree we should bump GS when handling DN failures in write pipeline. Cool, if we all agree bump GS is still useful, my current proposal is to add the logic "flushing data before bumping GS for failure recovery" for the patch. I will upload a new patch today or tomorrow. bq. We can discuss lease recovery at another jira. Agree. Lease recovery is tricky and we can start from some design doc first maybe. > Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests > to Coordinator) > --- > > Key: HDFS-9040 > URL: https://issues.apache.org/jira/browse/HDFS-9040 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Walter Su > Attachments: HDFS-9040-HDFS-7285.002.patch, > HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, > HDFS-9040.02.bgstreamer.patch > > > The general idea is to simplify error handling logic. > Proposal 1: > A BlockGroupDataStreamer to communicate with NN to allocate/update block, and > StripedDataStreamer s only have to stream blocks to DNs. > Proposal 2: > See below the > [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388] > from [~jingzhao]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests
[ https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901178#comment-14901178 ] Hadoop QA commented on HDFS-9116: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761470/HDFS-9116.000.patch | | Optional Tests | javac unit | | git revision | trunk / b00392d | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12572/console | This message was automatically generated. > Suppress false positives from Valgrind on uninitialized variables in tests > -- > > Key: HDFS-9116 > URL: https://issues.apache.org/jira/browse/HDFS-9116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Attachments: HDFS-9116.000.patch > > > Valgrind complains about uninitialized variables in the unit tests. It should > be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9110) Improve upon HDFS-8480
[ https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901175#comment-14901175 ] Hadoop QA commented on HDFS-9110: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 48s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 19s | The applied patch generated 5 new checkstyle issues (total was 2, now 6). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 10s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 194m 17s | Tests failed in hadoop-hdfs. | | | | 239m 19s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.cli.TestHDFSCLI | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.TestGenericRefresh | | | hadoop.cli.TestAclCLI | | | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761437/HDFS-9110.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c9cb6a5 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12567/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12567/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12567/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12567/console | This message was automatically generated. > Improve upon HDFS-8480 > -- > > Key: HDFS-9110 > URL: https://issues.apache.org/jira/browse/HDFS-9110 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Charlie Helin >Assignee: Charlie Helin >Priority: Minor > Fix For: 2.7.0, 2.6.1 > > Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, > HDFS-9110.02.patch > > > This is a request to do some cosmetic improvements on top of HDFS-8480. There > a couple of File -> java.nio.file.Path conversions which is a little bit > distracting. > The second aspect is more around efficiency, to be perfectly honest I'm not > sure what the number of files that may be processed. However as HDFS-8480 > eludes to it appears that this number could be significantly large. > The current implementation is basically a collect and process where all files > first is being examined; put into a collection and after that processed. > HDFS-8480 could simply be further enhanced by employing a single iteration > without creating an intermediary collection of filenames by using a FileWalker -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9107) Prevent NN's unrecoverable death spiral after full GC
[ https://issues.apache.org/jira/browse/HDFS-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901165#comment-14901165 ] Colin Patrick McCabe commented on HDFS-9107: bq. I don't trust monotonicNow if the thread can suspend between calls; cores on different sockets may give different answers, though it's not something I've seen in the field. Oracle's blog here [ https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks ] says: bq. If you are interested in measuring/calculating elapsed time, then always use System.nanoTime(). On most systems it will give a resolution on the order of microseconds. Be aware though, this call can also take microseconds to execute on some platforms. Of course, {{System#nanoTime}} is just a very thin wrapper around the operating system's monotonic clock. In x86-land, the monotonic clock generally comes from one of two sources: the TSC (timestamp counter) or the HPET (high precision event timer). In the 2000s, the TSC started becoming less useful because multi-core systems started becoming more common, and at that time, TSC wasn't synchronized across cores. This has since changed (at least for Intel systems), and the TSC is now synchronized across cores. So the alarm you are raising is about 5 years too late. Anyway, if you have a "bad" TSC, you can still get {{System#nanoTime}} to behave correctly by switching your operating system's clock source to the HPET. It's slower, but more reliable. If you want to read more about this, check out https://software.intel.com/en-us/forums/intel-isa-extensions/topic/332570 tl;dr 1. Operating systems implement various tricks to work around TSC bad behaviors 2. TSC bad behaviors are becoming less common in modern CPUs 3. You don't have to use the TSC if you don't want to! Let's let the hardware and OS people do their job and just do ours. I agree with [~hitliuyi]... +1 for the patch. Would be even better if we could close that small window of a GC happening at a time other than during the {{Thread#sleep}}. > Prevent NN's unrecoverable death spiral after full GC > - > > Key: HDFS-9107 > URL: https://issues.apache.org/jira/browse/HDFS-9107 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-9107.patch > > > A full GC pause in the NN that exceeds the dead node interval can lead to an > infinite cycle of full GCs. The most common situation that precipitates an > unrecoverable state is a network issue that temporarily cuts off multiple > racks. > The NN wakes up and falsely starts marking nodes dead. This bloats the > replication queues which increases memory pressure. The replications create a > flurry of incremental block reports and a glut of over-replicated blocks. > The "dead" nodes heartbeat within seconds. The NN forces a re-registration > which requires a full block report - more memory pressure. The NN now has to > invalidate all the over-replicated blocks. The extra blocks are added to > invalidation queues, tracked in an excess blocks map, etc - much more memory > pressure. > All the memory pressure can push the NN into another full GC which repeats > the entire cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7766) Add a flag to WebHDFS op=CREATE to not respond with a 307 redirect
[ https://issues.apache.org/jira/browse/HDFS-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901152#comment-14901152 ] Ravi Prakash commented on HDFS-7766: I'm assuming https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.html#L58 was commented for the same reason [~wheat9] ? [~jingzhao] ? > Add a flag to WebHDFS op=CREATE to not respond with a 307 redirect > -- > > Key: HDFS-7766 > URL: https://issues.apache.org/jira/browse/HDFS-7766 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-7766.01.patch, HDFS-7766.02.patch > > > Please see > https://issues.apache.org/jira/browse/HDFS-7588?focusedCommentId=14276192&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14276192 > A backwards compatible manner we can fix this is to add a flag on the request > which would disable the redirect, i.e. > {noformat} > curl -i -X PUT > "http://:/webhdfs/v1/?op=CREATE[&noredirect=] > {noformat} > returns 200 with the DN location in the response. > This would allow the Browser clients to get the redirect URL to put the file > to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8632) Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes
[ https://issues.apache.org/jira/browse/HDFS-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901153#comment-14901153 ] Zhe Zhang commented on HDFS-8632: - Thanks Rakesh for the work! Most annotations in the patch look good. The following are worth more discussions. [~andrew.wang] Could you share some advice in the context of release management? {code} +@InterfaceAudience.Public +@InterfaceStability.Evolving public final class ErasureCodingPolicy {code} {{Evolving}} actually sounds right to me. A side note is that we should probably have something similar to {{BlockStoragePolicySpi}} that is {{Stable}}. {code} +@InterfaceAudience.Private +@InterfaceStability.Evolving public class DFSStripedInputStream extends DFSInputStream { {code} {{DFSInputStream}} itself is {{Unstable}} (the default for {{Private}}). I guess we should make them consistent. Similar for {{StripedDataStreamer}} and {{BlockInfoStriped}}. {code} +@InterfaceAudience.Private +@InterfaceStability.Evolving public class BlockPlacementPolicies{ {code} Similar as above, should this be {{Evolving}} or the default {{Unstable}}? > Erasure Coding: Add InterfaceAudience annotation to the erasure coding classes > -- > > Key: HDFS-8632 > URL: https://issues.apache.org/jira/browse/HDFS-8632 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-8632-HDFS-7285-00.patch, > HDFS-8632-HDFS-7285-01.patch, HDFS-8632-HDFS-7285-02.patch, > HDFS-8632-HDFS-7285-03.patch > > > I've noticed some of the erasure coding classes missing > {{@InterfaceAudience}} annotation. It would be good to identify the classes > and add proper annotation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests
[ https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9116: - Status: Patch Available (was: Open) > Suppress false positives from Valgrind on uninitialized variables in tests > -- > > Key: HDFS-9116 > URL: https://issues.apache.org/jira/browse/HDFS-9116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Attachments: HDFS-9116.000.patch > > > Valgrind complains about uninitialized variables in the unit tests. It should > be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests
[ https://issues.apache.org/jira/browse/HDFS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9116: - Attachment: HDFS-9116.000.patch > Suppress false positives from Valgrind on uninitialized variables in tests > -- > > Key: HDFS-9116 > URL: https://issues.apache.org/jira/browse/HDFS-9116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Attachments: HDFS-9116.000.patch > > > Valgrind complains about uninitialized variables in the unit tests. It should > be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9116) Suppress false positives from Valgrind on uninitialized variables in tests
Haohui Mai created HDFS-9116: Summary: Suppress false positives from Valgrind on uninitialized variables in tests Key: HDFS-9116 URL: https://issues.apache.org/jira/browse/HDFS-9116 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Valgrind complains about uninitialized variables in the unit tests. It should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-5897) TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HDFS-5897. -- Resolution: Cannot Reproduce > TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk > > > Key: HDFS-5897 > URL: https://issues.apache.org/jira/browse/HDFS-5897 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu > Attachments: 5897-output.html > > > From > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1665/testReport/junit/org.apache.hadoop.hdfs.qjournal/TestNNWithQJM/testNewNamenodeTakesOverWriter/ > : > {code} > java.lang.Exception: test timed out after 3 milliseconds > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:412) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:401) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > {code} > I saw: > {code} > 2014-02-06 11:38:37,970 ERROR namenode.EditLogInputStream > (RedundantEditLogInputStream.java:nextOp(221)) - Got error reading edit log > input stream > http://localhost:40509/getJournal?jid=myjournal&segmentTxId=3&storageInfo=-51%3A1571339494%3A0%3AtestClusterID; > failing over to edit log > http://localhost:56244/getJournal?jid=myjournal&segmentTxId=3&storageInfo=-51%3A1571339494%3A0%3AtestClusterID > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 0; expected file to go up to 4 > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:194) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:140) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:167) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:120) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:708) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:606) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:874) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:634) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:446) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:502) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:658) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:643) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1291) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:939) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:824) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:678) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340) > at > org.apache.hadoop.hdfs.qjournal.TestNNWithQJM.testNewNamenodeTakesOverWriter(TestNNWithQJM.java:145) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.
[jira] [Assigned] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HDFS-6264: Assignee: Ted Yu > Provide FileSystem#create() variant which throws exception if parent > directory doesn't exist > > > Key: HDFS-6264 > URL: https://issues.apache.org/jira/browse/HDFS-6264 > Project: Hadoop HDFS > Issue Type: Task > Components: namenode >Affects Versions: 2.4.0 >Reporter: Ted Yu >Assignee: Ted Yu > Labels: hbase > Attachments: hdfs-6264-v1.txt > > > FileSystem#createNonRecursive() is deprecated. > However, there is no DistributedFileSystem#create() implementation which > throws exception if parent directory doesn't exist. > This limits clients' migration away from the deprecated method. > For HBase, IO fencing relies on the behavior of > FileSystem#createNonRecursive(). > Variant of create() method should be added which throws exception if parent > directory doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9111: Attachment: HDFS-9111.002.patch Thank you [~wheat9]. The v2 patch rebases from {{trunk}} branch resolving all conflicts. > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch, > HDFS-9111.002.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8873) throttle directoryScanner
[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901136#comment-14901136 ] Daniel Templeton commented on HDFS-8873: The scanjob queue is indeed ignoring volume when selecting the next job. I was considering the case where there are volumes of greatly differing sizes, in which case not binding a thread to a volume will result in a better distribution of the load. That's also true when the number of threads exceeds the number of volumes. That said, the point of the JIRA was not to change the load profile of the directory scanner; it was just to insert a throttle. I'll post a changeset with a reduced scope shortly. > throttle directoryScanner > - > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Daniel Templeton > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, > HDFS-8873.003.patch, HDFS-8873.004.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901121#comment-14901121 ] Haohui Mai commented on HDFS-9111: -- Turns out it's needs to be rebased to trunk. [~liuml07] can you please rebase the patch? Thanks. > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9111) Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient
[ https://issues.apache.org/jira/browse/HDFS-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901118#comment-14901118 ] Haohui Mai commented on HDFS-9111: -- +1. I'll commit it shortly. > Move hdfs-client protobuf convert methods from PBHelper to PBHelperClient > - > > Key: HDFS-9111 > URL: https://issues.apache.org/jira/browse/HDFS-9111 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9111.000.patch, HDFS-9111.001.patch > > > *TL;DR* This jira tracks the effort of moving PB helper methods, which > convert client side data structure to and from protobuf, to the > {{hadoop-hdfs-client}} module. > Currently the {{PBHelper}} class contains helper methods converting both > client and server side data structures from/to protobuf. As we move client > (and common) classes to {{hadoop-hdfs-client}} module (see [HDFS-8053] and > [HDFS-9039]), we also need to move client module related PB converters to > client module. > A good place may be a new class named {{PBHelperClient}}. After this, the > existing {{PBHelper}} class stays in {{hadoop-hdfs}} module with converters > for converting server side data structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups
[ https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9109: Attachment: HDFS-9109.01.patch > dfs.datanode.dns.interface does not work with hosts file based setups > - > > Key: HDFS-9109 > URL: https://issues.apache.org/jira/browse/HDFS-9109 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9109.01.patch > > > The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode > select its hostname by doing a reverse lookup of IP addresses on the specific > network interface. This does not work {{when /etc/hosts}} is used to setup > alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9109) dfs.datanode.dns.interface does not work with hosts file based setups
[ https://issues.apache.org/jira/browse/HDFS-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9109: Attachment: (was: HDFS-9109.01.patch) > dfs.datanode.dns.interface does not work with hosts file based setups > - > > Key: HDFS-9109 > URL: https://issues.apache.org/jira/browse/HDFS-9109 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9109.01.patch > > > The configuration setting {{dfs.datanode.dns.interface}} lets the DataNode > select its hostname by doing a reverse lookup of IP addresses on the specific > network interface. This does not work {{when /etc/hosts}} is used to setup > alternate hostnames, since {{DNS#reverseDns}} only queries the DNS servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9115) Create documentation to describe the overall architecture and rationales of the library
Haohui Mai created HDFS-9115: Summary: Create documentation to describe the overall architecture and rationales of the library Key: HDFS-9115 URL: https://issues.apache.org/jira/browse/HDFS-9115 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS-8707 It's beneficial to have documentations to describe the design decisions and rationales of the library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)