[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes
[ https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088928#comment-14088928 ] Arpit Agarwal commented on HDFS-6809: - +1 for the patch. Move some Balancer's inner classes to standalone classes Key: HDFS-6809 URL: https://issues.apache.org/jira/browse/HDFS-6809 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h6809_20140802.patch, h6809_20140806.patch Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can be moved out as standalone classes so that these classes can be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor
[ https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6812: Summary: Remove addBlock and replaceBlock from DatanodeDescriptor (was: Reomve addBlock and replaceBlock from DatanodeDescriptor) Remove addBlock and replaceBlock from DatanodeDescriptor Key: HDFS-6812 URL: https://issues.apache.org/jira/browse/HDFS-6812 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h6812_20140803.patch DatanodeDescriptor.addBlock(..) is not used anymore. DatanodeDescriptor.replaceBlock(..) is only used once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088933#comment-14088933 ] Hadoop QA commented on HDFS-6781: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660306/HDFS-6781.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7576//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7576//console This message is automatically generated. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor
[ https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088936#comment-14088936 ] Arpit Agarwal commented on HDFS-6812: - Nice little simplification. I think we can also fix findDatanode to return boolean, but let me take care of that under HDFS-6830. +1 Remove addBlock and replaceBlock from DatanodeDescriptor Key: HDFS-6812 URL: https://issues.apache.org/jira/browse/HDFS-6812 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h6812_20140803.patch DatanodeDescriptor.addBlock(..) is not used anymore. DatanodeDescriptor.replaceBlock(..) is only used once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6809) Move some Balancer's inner classes to standalone classes
[ https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6809: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Arpit for reviewing the patches. I have committed this. Move some Balancer's inner classes to standalone classes Key: HDFS-6809 URL: https://issues.apache.org/jira/browse/HDFS-6809 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6809_20140802.patch, h6809_20140806.patch Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can be moved out as standalone classes so that these classes can be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088966#comment-14088966 ] Hadoop QA commented on HDFS-6506: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651956/HDFS-6506.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7577//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7577//console This message is automatically generated. Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088972#comment-14088972 ] Arpit Agarwal commented on HDFS-6781: - Hi Akira, thanks for your continued efforts to improve our documentation. It is much appreciated. Nitpick suggestion if it makes sense to you: Instead of _HDFS Commands Manual_ perhaps we could call it _HDFS Commands Reference_ to be consistent with _Hadoop Commands Reference_. I'd be okay with committing the patch either way. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor
[ https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088977#comment-14088977 ] Hudson commented on HDFS-6812: -- FAILURE: Integrated in Hadoop-trunk-Commit #6027 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6027/]) HDFS-6812. Remove addBlock and replaceBlock from DatanodeDescriptor. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616426) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java Remove addBlock and replaceBlock from DatanodeDescriptor Key: HDFS-6812 URL: https://issues.apache.org/jira/browse/HDFS-6812 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h6812_20140803.patch DatanodeDescriptor.addBlock(..) is not used anymore. DatanodeDescriptor.replaceBlock(..) is only used once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes
[ https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088976#comment-14088976 ] Hudson commented on HDFS-6809: -- FAILURE: Integrated in Hadoop-trunk-Commit #6027 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6027/]) HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java Move some Balancer's inner classes to standalone classes Key: HDFS-6809 URL: https://issues.apache.org/jira/browse/HDFS-6809 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6809_20140802.patch, h6809_20140806.patch Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can be moved out as standalone classes so that these classes can be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6781: Attachment: HDFS-6781.2.patch Thanks Arpit for the review. Modified site.xml to use HDFS Commands Reference instead of HDFS Commands Manual. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6781: Attachment: (was: HDFS-6781.2.patch) Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6781: Attachment: HDFS-6781.2.patch Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-6781: Attachment: HDFS-6781-branch-2.2.patch Updated the patch for branch-2 also. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089031#comment-14089031 ] Hadoop QA commented on HDFS-6781: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660348/HDFS-6781-branch-2.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7579//console This message is automatically generated. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089060#comment-14089060 ] Akira AJISAKA commented on HDFS-6682: - [~atm], would you please review this patch? Add a metric to expose the timestamp of the oldest under-replicated block - Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HDFS-6682.patch In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6830) BlockManager.addStorage fails when DN updates storage
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089092#comment-14089092 ] Hadoop QA commented on HDFS-6830: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660320/HDFS-6830.01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfo {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7578//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7578//console This message is automatically generated. BlockManager.addStorage fails when DN updates storage - Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6830.01.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor
[ https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6812: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Arpit for reviewing the patch. I have committed this. Remove addBlock and replaceBlock from DatanodeDescriptor Key: HDFS-6812 URL: https://issues.apache.org/jira/browse/HDFS-6812 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6812_20140803.patch DatanodeDescriptor.addBlock(..) is not used anymore. DatanodeDescriptor.replaceBlock(..) is only used once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6134: --- Attachment: HDFSDataatRestEncryption.pdf I've attached a document that discusses the general design of this feature. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089124#comment-14089124 ] Hadoop QA commented on HDFS-6134: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660368/HDFSDataatRestEncryption.pdf against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7580//console This message is automatically generated. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor
[ https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089138#comment-14089138 ] Hudson commented on HDFS-6812: -- FAILURE: Integrated in Hadoop-Yarn-trunk #637 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/637/]) HDFS-6812. Remove addBlock and replaceBlock from DatanodeDescriptor. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616426) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java Remove addBlock and replaceBlock from DatanodeDescriptor Key: HDFS-6812 URL: https://issues.apache.org/jira/browse/HDFS-6812 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6812_20140803.patch DatanodeDescriptor.addBlock(..) is not used anymore. DatanodeDescriptor.replaceBlock(..) is only used once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes
[ https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089136#comment-14089136 ] Hudson commented on HDFS-6809: -- FAILURE: Integrated in Hadoop-Yarn-trunk #637 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/637/]) HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java Move some Balancer's inner classes to standalone classes Key: HDFS-6809 URL: https://issues.apache.org/jira/browse/HDFS-6809 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6809_20140802.patch, h6809_20140806.patch Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can be moved out as standalone classes so that these classes can be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
Shinichi Yamashita created HDFS-6833: Summary: DirectoryScanner should not register a deleting block with memory of DataNode Key: HDFS-6833 URL: https://issues.apache.org/jira/browse/HDFS-6833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita When a block is deleted in DataNode, the following messages are usually output. {code} 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} However, DirectoryScanner may be executed when DataNode deletes the block in the current implementation. And the following messsages are output. {code} 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 for deletion 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED getNumBytes() = 21230663 getBytesOnDisk() = 21230663 getVisibleLength()= 21230663 getVolume() = /hadoop/data1/dfs/data/current getBlockFile()= /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 unlinked =false 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 {code} Deleting block information is registered in DataNode's memory. And when DataNode sends a block report, NameNode receives wrong block information. For example, when we execute recommission or change the number of replication, NameNode may delete the right block as ExcessReplicate by this problem. And Under-Replicated Blocks and Missing Blocks occur. When DataNode run DirectoryScanner, DataNode should not register a deleting block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6517) Remove hadoop-metrics2.properties from hdfs project
[ https://issues.apache.org/jira/browse/HDFS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089226#comment-14089226 ] Hudson commented on HDFS-6517: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/]) HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA via aw) (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616262) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/conf/hadoop-metrics2.properties HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA via aw) (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616261) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove hadoop-metrics2.properties from hdfs project --- Key: HDFS-6517 URL: https://issues.apache.org/jira/browse/HDFS-6517 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6517.patch HDFS-side of HADOOP-9919. HADOOP-9919 updated hadoop-metrics2.properties examples to YARN, however, the examples are still old because hadoop-metrics2.properties in HDFS project is actually packaged. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089222#comment-14089222 ] Hudson commented on HDFS-6791: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/]) HDFS-6791. A block could remain under replicated if all of its replicas are on decommissioned nodes. Contributed by Ming Ma. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616306) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java A block could remain under replicated if all of its replicas are on decommissioned nodes Key: HDFS-6791 URL: https://issues.apache.org/jira/browse/HDFS-6791 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.6.0 Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch Here is the scenario. 1. Normally before NN transitions a DN to decommissioned state, enough replicas have been copied to other in service DNs. However, in some rare situations, the cluster got into a state where a DN is in decommissioned state and a block's only replica is on that DN. In such state, the number of replication reported by fsck is 1; the block just stays in under replicated state; applications can still read the data, given decommissioned node can served read traffic. This can happen in some error situations such DN failure or NN failover. For example a) a block's only replica is node A temporarily. b) Start decommission process on node A. c) When node A is in decommission-in-progress state, node A crashed. NN will mark node A as dead. d) After node A rejoins the cluster, NN will mark node A as decommissioned. 2. In theory, NN should take care of under replicated blocks. But it doesn't for this special case where the only replica is on decommissioned node. That is because NN has the policy of decommissioned node can't be picked the source node for replication. {noformat} BlockManager.java chooseSourceDatanode // never use already decommissioned nodes if(node.isDecommissioned()) continue; {noformat} 3. Given NN marks the node as decommissioned, admins will shutdown the datanode. Under replicated blocks turn into missing blocks. 4. The workaround is to recommission the node so that NN can start the replication from the node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor
[ https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089225#comment-14089225 ] Hudson commented on HDFS-6812: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/]) HDFS-6812. Remove addBlock and replaceBlock from DatanodeDescriptor. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616426) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java Remove addBlock and replaceBlock from DatanodeDescriptor Key: HDFS-6812 URL: https://issues.apache.org/jira/browse/HDFS-6812 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6812_20140803.patch DatanodeDescriptor.addBlock(..) is not used anymore. DatanodeDescriptor.replaceBlock(..) is only used once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes
[ https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089223#comment-14089223 ] Hudson commented on HDFS-6809: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/]) HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java Move some Balancer's inner classes to standalone classes Key: HDFS-6809 URL: https://issues.apache.org/jira/browse/HDFS-6809 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6809_20140802.patch, h6809_20140806.patch Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can be moved out as standalone classes so that these classes can be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089247#comment-14089247 ] Daryn Sharp commented on HDFS-6776: --- Sorry, but this patch is completely wrong. # If security is enabled and an {{IOException}} happens for any reason - transient or legit - while acquiring a token, the client will continue to work because of spnego but if a job is submitted the tasks will all fail due to no token. # Webhdfs should be using the same insecure fallback policy as RPC. # Insecure RPC services return null if a token is requested. Like DFSClient, the webhdfs client should be able to handle that condition instead of throwing the exception you see. # Issuing a malformed OPEN call is not ok... # Although irrelevant in like of the above, connection.connect() isn't doing what you think. It proved the client could open a connection and send the request. It doesn't prove the server allowed/authenticated the request. The you read the response, the server should have been angry you issued an invalid open. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at
[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes
[ https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089302#comment-14089302 ] Hudson commented on HDFS-6809: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1856 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1856/]) HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java Move some Balancer's inner classes to standalone classes Key: HDFS-6809 URL: https://issues.apache.org/jira/browse/HDFS-6809 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h6809_20140802.patch, h6809_20140806.patch Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can be moved out as standalone classes so that these classes can be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
[ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089301#comment-14089301 ] Hudson commented on HDFS-6791: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1856 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1856/]) HDFS-6791. A block could remain under replicated if all of its replicas are on decommissioned nodes. Contributed by Ming Ma. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616306) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java A block could remain under replicated if all of its replicas are on decommissioned nodes Key: HDFS-6791 URL: https://issues.apache.org/jira/browse/HDFS-6791 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.6.0 Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch Here is the scenario. 1. Normally before NN transitions a DN to decommissioned state, enough replicas have been copied to other in service DNs. However, in some rare situations, the cluster got into a state where a DN is in decommissioned state and a block's only replica is on that DN. In such state, the number of replication reported by fsck is 1; the block just stays in under replicated state; applications can still read the data, given decommissioned node can served read traffic. This can happen in some error situations such DN failure or NN failover. For example a) a block's only replica is node A temporarily. b) Start decommission process on node A. c) When node A is in decommission-in-progress state, node A crashed. NN will mark node A as dead. d) After node A rejoins the cluster, NN will mark node A as decommissioned. 2. In theory, NN should take care of under replicated blocks. But it doesn't for this special case where the only replica is on decommissioned node. That is because NN has the policy of decommissioned node can't be picked the source node for replication. {noformat} BlockManager.java chooseSourceDatanode // never use already decommissioned nodes if(node.isDecommissioned()) continue; {noformat} 3. Given NN marks the node as decommissioned, admins will shutdown the datanode. Under replicated blocks turn into missing blocks. 4. The workaround is to recommission the node so that NN can start the replication from the node. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6517) Remove hadoop-metrics2.properties from hdfs project
[ https://issues.apache.org/jira/browse/HDFS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089305#comment-14089305 ] Hudson commented on HDFS-6517: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1856 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1856/]) HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA via aw) (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616262) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/conf/hadoop-metrics2.properties HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA via aw) (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616261) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove hadoop-metrics2.properties from hdfs project --- Key: HDFS-6517 URL: https://issues.apache.org/jira/browse/HDFS-6517 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6517.patch HDFS-side of HADOOP-9919. HADOOP-9919 updated hadoop-metrics2.properties examples to YARN, however, the examples are still old because hadoop-metrics2.properties in HDFS project is actually packaged. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089372#comment-14089372 ] Yongjun Zhang commented on HDFS-6776: - HI [~daryn], thank you so much for the very helpful comments. I will look into addressing them in next revision. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at
[jira] [Commented] (HDFS-6782) Improve FS editlog logSync
[ https://issues.apache.org/jira/browse/HDFS-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089402#comment-14089402 ] Daryn Sharp commented on HDFS-6782: --- Edit logging is pretty tricky. I need to think about it more. It seems like if {{syncStart}} is an instance member instead of block scoped, this simple condition might work as the last line of {{logEdit}}: {{if (mytxid syncStart) logSync()}} Improve FS editlog logSync -- Key: HDFS-6782 URL: https://issues.apache.org/jira/browse/HDFS-6782 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-6782.001.patch, HDFS-6782.002.patch In NN, it uses a double buffer (bufCurrent, bufReady) for log sync, bufCurrent it to buffer new coming edit ops and bufReady is for flushing. This's efficient. When flush is ongoing, and bufCurrent is full, NN goes to force log sync, and all new Ops are blocked (since force log sync is protected by FSNameSystem write lock). After the flush finished, the new Ops are still blocked, but actually at this time, bufCurrent is free and Ops can go ahead and write to the buffer. The following diagram shows the detail. This JIRA is for this improvement. Thanks [~umamaheswararao] for confirming this issue. {code} edit1(txid1) -- write to bufCurrent logSync - (swap buffer)flushing --- edit2(txid2) -- write to bufCurrent logSync - waiting --- edit3(txid3) -- write to bufCurrent logSync - waiting --- edit4(txid4) -- write to bufCurrent logSync - waiting --- edit5(txid5) -- write to bufCurrent --full-- force sync - waiting --- edit6(txid6) -- blocked ... editn(txidn) -- blocked {code} After the flush, it becomes {code} edit1(txid1) -- write to bufCurrent logSync - finished edit2(txid2) -- write to bufCurrent logSync - flushing --- edit3(txid3) -- write to bufCurrent logSync - waiting --- edit4(txid4) -- write to bufCurrent logSync - waiting --- edit5(txid5) -- write to bufCurrent --full-- force sync - waiting --- edit6(txid6) -- blocked ... editn(txidn) -- blocked {code} After edit1 finished, bufCurrent is free, and the thread which flushes txid2 will also flushes txid3-5, so we should return from the force sync of edit5 and FSNamesystem write lock will be freed (Don't worry that edit5 Op will return, since there will be a normal logSync after the force logSync and there will wait for sync finished). This is the idea of this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6781: Attachment: HDFS-6781.3.patch Resubmitting the trunk patch with a different name for Jenkins. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089410#comment-14089410 ] Arpit Agarwal commented on HDFS-6781: - +1 pending Jenkins. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6828) Separate block replica dispatching from Balancer
[ https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6828: -- Status: Patch Available (was: Open) Separate block replica dispatching from Balancer Key: HDFS-6828 URL: https://issues.apache.org/jira/browse/HDFS-6828 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6828_20140808.patch The Balancer class implements two major features, (1) balancing logic for selecting replicas in order to balance the cluster and (2) block replica dispatching for moving the block replica around. This JIRA is to separate (2) from Balancer so that the code could be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6828) Separate block replica dispatching from Balancer
[ https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6828: -- Attachment: h6828_20140808.patch h6828_20140808.patch: separates Dispatcher from Balancer. Separate block replica dispatching from Balancer Key: HDFS-6828 URL: https://issues.apache.org/jira/browse/HDFS-6828 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6828_20140808.patch The Balancer class implements two major features, (1) balancing logic for selecting replicas in order to balance the cluster and (2) block replica dispatching for moving the block replica around. This JIRA is to separate (2) from Balancer so that the code could be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6831) Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'
[ https://issues.apache.org/jira/browse/HDFS-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089505#comment-14089505 ] Jing Zhao commented on HDFS-6831: - Also in dfsadmin -help there are only a couple of commands' help information mention that it requires superuser permissions. Maybe we can move this phrase into the help summary. Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help' --- Key: HDFS-6831 URL: https://issues.apache.org/jira/browse/HDFS-6831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Akira AJISAKA Priority: Minor Labels: newbie There is an inconsistency between the console outputs of 'hdfs dfsadmin' command and 'hdfs dfsadmin -help' command. {code} [root@trunk ~]# hdfs dfsadmin Usage: java DFSAdmin Note: Administrative commands can only be run as the HDFS superuser. [-report] [-safemode enter | leave | get | wait] [-allowSnapshot snapshotDir] [-disallowSnapshot snapshotDir] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade] [-rollingUpgrade [query|prepare|finalize]] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-refresh] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setQuota quota dirname...dirname] [-clrQuota dirname...dirname] [-setSpaceQuota quota dirname...dirname] [-clrSpaceQuota dirname...dirname] [-setBalancerBandwidth bandwidth in bytes per second] [-fetchImage local directory] [-shutdownDatanode datanode_host:ipc_port [upgrade]] [-getDatanodeInfo datanode_host:ipc_port] [-help [cmd]] {code} {code} [root@trunk ~]# hdfs dfsadmin -help hadoop dfsadmin performs DFS administrative commands. The full syntax is: hadoop dfsadmin [-report [-live] [-dead] [-decommissioning]] [-safemode enter | leave | get | wait] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-setQuota quota dirname...dirname] [-clrQuota dirname...dirname] [-setSpaceQuota quota dirname...dirname] [-clrSpaceQuota dirname...dirname] [-finalizeUpgrade] [-rollingUpgrade [query|prepare|finalize]] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-refresh host:ipc_port key [arg1..argn] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanodehost:port blockpoolId [force]] [-setBalancerBandwidth bandwidth] [-fetchImage local directory] [-allowSnapshot snapshotDir] [-disallowSnapshot snapshotDir] [-shutdownDatanode datanode_host:ipc_port [upgrade]] [-getDatanodeInfo datanode_host:ipc_port [-help [cmd] {code} These two outputs should be the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6834) Improve the configuration guidance in DFSClient when there are no Codec classes found in configs
Uma Maheswara Rao G created HDFS-6834: - Summary: Improve the configuration guidance in DFSClient when there are no Codec classes found in configs Key: HDFS-6834 URL: https://issues.apache.org/jira/browse/HDFS-6834 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor This is the comment in HADOOP-10886 from Andrew. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6834) Improve the configuration guidance in DFSClient when there are no Codec classes found in configs
[ https://issues.apache.org/jira/browse/HDFS-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-6834: -- Attachment: HDFS-6834.patch Attached simple patch handle. Improve the configuration guidance in DFSClient when there are no Codec classes found in configs Key: HDFS-6834 URL: https://issues.apache.org/jira/browse/HDFS-6834 Project: Hadoop HDFS Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-6834.patch This is the comment in HADOOP-10886 from Andrew. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6830) BlockManager.addStorage fails when DN updates storage
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6830: Attachment: HDFS-6830.02.patch Rebase patch after HDFS-6812 commit. BlockManager.addStorage fails when DN updates storage - Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089546#comment-14089546 ] stack commented on HDFS-6803: - [~cmccabe] Thanks for the very nice feedback. Let me incorporate/fix what has been posted. [~stev...@iseran.com] Thanks for jumping on boss. bq. Consistency with actual file data metadata Yes. Lets fold in your text. You are describing the FS as it is today. bq. ...The second read() would succeed/return -1 depending on the position Do you mean the third read in above? bq. When a pread is in progress, should that change be visible in getPos()? I like [~cmccabe]'s groupings which implies pread does not change getPos. How should I proceed with this issue [~ste...@apache.org]? I'd like to get 2.1 and 2.2 (from attached doc) blessed as scripture. Seems like a bit of cleanup is all that is needed in HDFS (and as [~hitliuyi] suggests, we could probably remove some synchronizes). My guess is the other FS implementations have not been implemented the way HDFS has been and that backfilling a pread to run independent of a read would be a bunch of work. Would this work be a blocker on adding 2.1/2.2 to the spec and HDFS? Thanks Steve. Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: DocumentingDFSClientDFSInputStream (1).pdf Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089597#comment-14089597 ] Arpit Agarwal commented on HDFS-6772: - [~mingma], I was unsure whether this delta can result in lost commands, since it will cause the caller {{processCommands}} to discard any subsequent commands. {code} --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java @@ -531,7 +531,7 @@ boolean processCommandFromActor(DatanodeCommand cmd, LOG.info(DatanodeCommand action : DNA_REGISTER from + actor.nnAddr + with + actor.state + state); actor.reRegister(); - return true; + return false; {code} On further investigation it works because RegisterCommand is sent by itself. Could you please add a comment to {{RegisterCommand}} stating it must not be combined with other commands in the same response? Thanks for adding a test case. I think you can remove this comment _Connection to NN times due to NN restart._ A timeout is not needed for the test case to work. The NN will always ask the DN to re-register after restart. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6821) Atomicity of multi file operations
[ https://issues.apache.org/jira/browse/HDFS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-6821. - Resolution: Won't Fix Hi, [~samera]. Ideas similar to this have been proposed several times. The consensus has always been that pushing a recursive operation all the way to the NameNode for atomicity would impact throughput too severely. The implementation would require holding the write lock while updating every inode in a subtree. During that time, all other RPC caller threads would block waiting for release of the write lock. A finer-grained locking implementation would help mitigate this, but it wouldn't eliminate the problem completely. It's typical behavior in many file systems that recursive operations are driven from user space, and the syscalls modify a single inode at a time. HDFS isn't different in this respect. I'm going to resolve this as won't fix. Atomicity of multi file operations -- Key: HDFS-6821 URL: https://issues.apache.org/jira/browse/HDFS-6821 Project: Hadoop HDFS Issue Type: Bug Reporter: Samer Al-Kiswany Priority: Minor Looking how HDFS updates the log files in case of chmod –r or chown –r operations. In these operations, HDFS name node seems to update each file separately; consequently the strace of the operation looks as follows. append(edits) fsync(edits) append(edits) fsync(edits) --- append(edits) fsync(edits) append(edits) fsync(edits) If a crash happens in the middle of this operation (e.g. at the dashed line in the trace), the system will end up with part of the files updates with the new owner or permissions and part still with the old owner. Isn’t it better to log the whole operations (chown -r) as one entry in the edit file? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089650#comment-14089650 ] Arpit Agarwal commented on HDFS-6772: - Also the test case should be wrapped in try.. finally so it can shutdown the cluster if the test fails midway. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089653#comment-14089653 ] Arpit Agarwal commented on HDFS-6425: - Hi Ming, is this problem mitigated by your fix for HDFS-6772? Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089670#comment-14089670 ] Hadoop QA commented on HDFS-6781: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660391/HDFS-6781.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7581//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7581//console This message is automatically generated. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6825: Attachment: HDFS-6825.001.patch Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6821) Atomicity of multi file operations
[ https://issues.apache.org/jira/browse/HDFS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089671#comment-14089671 ] Samer Al-Kiswany commented on HDFS-6821: I see. Thanks Chris. -samer Atomicity of multi file operations -- Key: HDFS-6821 URL: https://issues.apache.org/jira/browse/HDFS-6821 Project: Hadoop HDFS Issue Type: Bug Reporter: Samer Al-Kiswany Priority: Minor Looking how HDFS updates the log files in case of chmod –r or chown –r operations. In these operations, HDFS name node seems to update each file separately; consequently the strace of the operation looks as follows. append(edits) fsync(edits) append(edits) fsync(edits) --- append(edits) fsync(edits) append(edits) fsync(edits) If a crash happens in the middle of this operation (e.g. at the dashed line in the trace), the system will end up with part of the files updates with the new owner or permissions and part still with the old owner. Isn’t it better to log the whole operations (chown -r) as one entry in the edit file? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6781: Issue Type: Improvement (was: Bug) Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6825: Status: Patch Available (was: Open) Submit patch 001 to address the issue. Two additional issues were found and fixed: 1. When snapshot for a file doesn't exist, FSNamespace.commitBlockSynchronization would thrown NPE, because the blockCollection of the storedBlock was set to null by a delete operation. 2. BlockInfoUnderConstruction.appendUCParts doesn't check whether replicas is null or not Thanks for reviewing. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6828) Separate block replica dispatching from Balancer
[ https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089690#comment-14089690 ] Hadoop QA commented on HDFS-6828: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660398/h6828_20140808.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7582//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7582//console This message is automatically generated. Separate block replica dispatching from Balancer Key: HDFS-6828 URL: https://issues.apache.org/jira/browse/HDFS-6828 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6828_20140808.patch The Balancer class implements two major features, (1) balancing logic for selecting replicas in order to balance the cluster and (2) block replica dispatching for moving the block replica around. This JIRA is to separate (2) from Balancer so that the code could be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6781: Resolution: Fixed Fix Version/s: 2.6.0 3.0.0 Target Version/s: 2.6.0 (was: 3.0.0, 2.6.0) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089705#comment-14089705 ] Hudson commented on HDFS-6781: -- FAILURE: Integrated in Hadoop-trunk-Commit #6030 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6030/]) HDFS-6781. Separate HDFS commands from CommandsManual.apt.vm. (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616575) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * /hadoop/common/trunk/hadoop-project/src/site/site.xml Separate HDFS commands from CommandsManual.apt.vm - Key: HDFS-6781 URL: https://issues.apache.org/jira/browse/HDFS-6781 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch HDFS-side of HADOOP-10899. The CommandsManual lists very old information about running HDFS subcommands from the 'hadoop' shell CLI. These are deprecated and should be removed. If necessary, the HDFS subcommands should be added to the HDFS documentation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6834) Improve the configuration guidance in DFSClient when there are no Codec classes found in configs
[ https://issues.apache.org/jira/browse/HDFS-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089758#comment-14089758 ] Andrew Wang commented on HDFS-6834: --- +1 thanks Uma Improve the configuration guidance in DFSClient when there are no Codec classes found in configs Key: HDFS-6834 URL: https://issues.apache.org/jira/browse/HDFS-6834 Project: Hadoop HDFS Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-6834.patch This is the comment in HADOOP-10886 from Andrew. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6766) optimize ack notify mechanism to avoid thundering herd issue
[ https://issues.apache.org/jira/browse/HDFS-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089828#comment-14089828 ] stack commented on HDFS-6766: - Ran tests with 10 concurrent writers to a HDFS file hosted on a small (5-node) cluster, then with 20 concurrent writers and finally with 100 concurrent writing threads. Each thread wrote 100k times. Tests lasted 2-3 minutes dependent on thread count. Below are context switch counts as reported by linux perf. ||Threads||No patch||With patch||Difference||%|| |10|7,855,181|7,055,688|-799,493|-10.17790679| |10|7,849,103|7,099,845|-749,258|-9.545778671| |10|7,592,115|7,183,892|-408,223|-5.376933832| |20|9,107,196|8,168,499|-938,697|-10.30720103| |20|8,983,253|8,164,469|-818,784|-9.114560171| |20|9,192,111|8,149,535|-1,042,576|-11.34207365| |100|18,503,931|17,013,636|-1,490,295|-8.053937296| |100|18,553,534|17,051,602|-1,501,932|-8.095126244| |100|18,691,605|17,058,533|-1,633,072|-8.736927621| Here is what I ran to test (from hbase trunk -- writes WAL, a sequence file): {code}for i in 10 20 100; do for j in 1 2 3; do perf stat ${HOME}/hbase-2.0.0-SNAPSHOT/bin/hbase --config $HOME/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads $i -iterati ons 10 -keySize 50 -valueSize 100 /tmp/$1.${i}.${j}.txt; done; done{code} optimize ack notify mechanism to avoid thundering herd issue Key: HDFS-6766 URL: https://issues.apache.org/jira/browse/HDFS-6766 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6766.txt Currently, DFSOutputStream uses wait/notifyAll to coordinate ack receiving and ack waiting, etc.. say there're 5 threads(t1,t2,t3,t4,t5) wait for ack seq no: 1,2,3,4,5, once the no. 1 ack arrived, the notifyAll be called, so t2/t3/t4/t5 could do nothing except wait again. we can rewrite it with Condition class, with a fair policy(fifo), we can just make t1 be notified, so a number of context switch be saved. It's possible more than one thread waiting on the same ack seq no(e.g. no more data be written between two flush operations), so once it happened, we need to notify those threads, so i introduced a set to remember this seq no. In a simple HBase ycsb testing, the context switch number per second was reduced about 15%, and reduced sys cpu% about 6%(My HBase has new write model patch, i think the benefit will be higher if w/o it) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6766) optimize ack notify mechanism to avoid thundering herd issue
[ https://issues.apache.org/jira/browse/HDFS-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089847#comment-14089847 ] stack commented on HDFS-6766: - 5-10% improvement in context switches. That said, tests with patch takes longer to complete and tjroughput is less across the board. Here is output for the last two 100 thread runs: NoPatch: {code} 561652.888388 task-clock#3.519 CPUs utilized 18,691,605 context-switches #0.033 M/sec 4,612,298 CPU-migrations#0.008 M/sec 578,966 page-faults #0.001 M/sec 847,613,184,509 cycles#1.509 GHz [83.30%] 643,171,621,294 stalled-cycles-frontend # 75.88% frontend cycles idle [83.32%] 378,342,727,102 stalled-cycles-backend# 44.64% backend cycles idle [66.68%] 404,794,735,743 instructions #0.48 insns per cycle #1.59 stalled cycles per insn [83.39%] 70,996,040,867 branches # 126.406 M/sec [83.36%] 2,599,494,946 branch-misses #3.66% of all branches [83.34%] 159.595619057 seconds time elapsed {code} WithPatch {code} 555117.674087 task-clock#3.248 CPUs utilized 17,058,533 context-switches #0.031 M/sec 3,928,780 CPU-migrations#0.007 M/sec 576,656 page-faults #0.001 M/sec 839,218,544,729 cycles#1.512 GHz [83.32%] 641,419,880,735 stalled-cycles-frontend # 76.43% frontend cycles idle [83.37%] 386,633,844,790 stalled-cycles-backend# 46.07% backend cycles idle [66.71%] 391,833,659,097 instructions #0.47 insns per cycle #1.64 stalled cycles per insn [83.34%] 68,406,883,351 branches # 123.230 M/sec [83.25%] 2,674,118,142 branch-misses #3.91% of all branches [83.36%] 170.934250947 seconds time elapsed {code} All numbers are better w/ patch except instructions per cycle. optimize ack notify mechanism to avoid thundering herd issue Key: HDFS-6766 URL: https://issues.apache.org/jira/browse/HDFS-6766 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6766.txt Currently, DFSOutputStream uses wait/notifyAll to coordinate ack receiving and ack waiting, etc.. say there're 5 threads(t1,t2,t3,t4,t5) wait for ack seq no: 1,2,3,4,5, once the no. 1 ack arrived, the notifyAll be called, so t2/t3/t4/t5 could do nothing except wait again. we can rewrite it with Condition class, with a fair policy(fifo), we can just make t1 be notified, so a number of context switch be saved. It's possible more than one thread waiting on the same ack seq no(e.g. no more data be written between two flush operations), so once it happened, we need to notify those threads, so i introduced a set to remember this seq no. In a simple HBase ycsb testing, the context switch number per second was reduced about 15%, and reduced sys cpu% about 6%(My HBase has new write model patch, i think the benefit will be higher if w/o it) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6772: -- Attachment: HDFS-6772-2.patch Thanks, Arpit. Here is the updated patch to address the issues you raised. I also changed when content stale storages metrics is calculated. Instead of on demand when the metrics is pull via JMX, it is calculated in the background by HeartbeatManager. It should be is ok given the freshness requirement of this metrics. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6830) BlockManager.addStorage fails when DN updates storage
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089903#comment-14089903 ] Hadoop QA commented on HDFS-6830: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660426/HDFS-6830.02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfo org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7583//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7583//console This message is automatically generated. BlockManager.addStorage fails when DN updates storage - Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6830) BlockManager.addStorage fails when DN updates storage
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6830: Attachment: HDFS-6830.03.patch Fix test case. BlockManager.addStorage fails when DN updates storage - Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, HDFS-6830.03.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089967#comment-14089967 ] Ming Ma commented on HDFS-6425: --- Thanks, Arpit. This jira can address more common NN failover scenario with lots of content stale storages. We try to get storages out of content stale as soon as possible. Here are several scenarios. a. For non-HA NN restart, have DN send HB before BR right after registration. b. For HA setup, NN becomes active right after it restarts. This can happen if we have to restart both NNs at the same time, due to some rare outage or some incompatible upgrade. In this case, the active NN will first go to standby, then get transitioned to active at which point all DNs will be marked as stale again. For big clusters, most of the DN reregistration will come in after the NN becomes active, so the fix to have DNs send HB and BR right after registration will also help. c. For HA setup, NN becomes active after the NN JVM has been up for some time. The failover could happen due to zk session timeout, or the other NN just crashes. In this case, there is no DN reregistration given the new active NN doesn't have recent restart. We can change the NN to ask DN to resend blockreport upon failover, but that will cause cluster performance issue. So we still have some scenario where we might have lots of content stale storages. This jira tries to make NN handle the scenario better. Large postponedMisreplicatedBlocks has impact on blockReport latency Key: HDFS-6425 URL: https://issues.apache.org/jira/browse/HDFS-6425 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch Sometimes we have large number of over replicates when NN fails over. When the new active NN took over, over replicated blocks will be put to postponedMisreplicatedBlocks until all DNs for that block aren't stale anymore. We have a case where NNs flip flop. Before postponedMisreplicatedBlocks became empty, NN fail over again and again. So postponedMisreplicatedBlocks just kept increasing until the cluster is stable. In addition, large postponedMisreplicatedBlocks could make rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089969#comment-14089969 ] Hadoop QA commented on HDFS-6825: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660440/HDFS-6825.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCommitBlockSynchronization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7584//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7584//console This message is automatically generated. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.
[ https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089980#comment-14089980 ] Aaron T. Myers commented on HDFS-6728: -- Thanks, Eddy. I agree that the test failure is unrelated. +1, I'm going to commit this momentarily. Dynamically add new volumes to DataStorage, formatted if necessary. --- Key: HDFS-6728 URL: https://issues.apache.org/jira/browse/HDFS-6728 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Labels: datanode Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch When dynamically adding a volume to {{DataStorage}}, it should prepare the {{data dir}}, e.g., formatting if it is empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089983#comment-14089983 ] Arpit Agarwal commented on HDFS-6772: - Thanks Ming. The patch looks good. Following minor issues and I apologize for not mentioning #2 and #3 last time. # Local variable {{numOfStorages}} in {{heartbeatCheck}} is unused. It should be removed. # In {{scheduleHeartbeat}}, we can just set {{lastHeartbeat = 0}}. It's easier to follow and has the same effect. # Could we rename {{numContentStaleStorages}} to {{numStaleStorages}} to be consistent with {{numStaleNodes}}? Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.
[ https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089986#comment-14089986 ] Lei (Eddy) Xu commented on HDFS-6728: - Great! Thank you for the reviews, [~atm]. Dynamically add new volumes to DataStorage, formatted if necessary. --- Key: HDFS-6728 URL: https://issues.apache.org/jira/browse/HDFS-6728 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Labels: datanode Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch When dynamically adding a volume to {{DataStorage}}, it should prepare the {{data dir}}, e.g., formatting if it is empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.
[ https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6728: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Eddy. Dynamically add new volumes to DataStorage, formatted if necessary. --- Key: HDFS-6728 URL: https://issues.apache.org/jira/browse/HDFS-6728 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Labels: datanode Fix For: 2.6.0 Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch When dynamically adding a volume to {{DataStorage}}, it should prepare the {{data dir}}, e.g., formatting if it is empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6740) FSDataset adds data volumes dynamically
[ https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089998#comment-14089998 ] Aaron T. Myers commented on HDFS-6740: -- Pretty confident that the test failure is unrelated - it's been failing in other builds as well, and I can't reproduce it on my box. +1, I'm going to commit this momentarily. FSDataset adds data volumes dynamically --- Key: HDFS-6740 URL: https://issues.apache.org/jira/browse/HDFS-6740 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to be able to add volumes dynamically during runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6740) Make FSDataset support adding data volumes dynamically
[ https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6740: - Summary: Make FSDataset support adding data volumes dynamically (was: FSDataset adds data volumes dynamically) Make FSDataset support adding data volumes dynamically -- Key: HDFS-6740 URL: https://issues.apache.org/jira/browse/HDFS-6740 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to be able to add volumes dynamically during runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6740) Make FSDataset support adding data volumes dynamically
[ https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6740: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Eddy. Make FSDataset support adding data volumes dynamically -- Key: HDFS-6740 URL: https://issues.apache.org/jira/browse/HDFS-6740 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.6.0 Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to be able to add volumes dynamically during runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6828) Separate block replica dispatching from Balancer
[ https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6828: -- Attachment: h6828_20140808b.patch h6828_20140808b.patch: fixes some bugs. Separate block replica dispatching from Balancer Key: HDFS-6828 URL: https://issues.apache.org/jira/browse/HDFS-6828 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6828_20140808.patch, h6828_20140808b.patch The Balancer class implements two major features, (1) balancing logic for selecting replicas in order to balance the cluster and (2) block replica dispatching for moving the block replica around. This JIRA is to separate (2) from Balancer so that the code could be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.
[ https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090033#comment-14090033 ] Hudson commented on HDFS-6728: -- FAILURE: Integrated in Hadoop-trunk-Commit #6033 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6033/]) HDFS-6728. Dynamically add new volumes to DataStorage, formatted if necessary. Contributed by Lei Xu. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616615) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataStorage.java Dynamically add new volumes to DataStorage, formatted if necessary. --- Key: HDFS-6728 URL: https://issues.apache.org/jira/browse/HDFS-6728 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Labels: datanode Fix For: 2.6.0 Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch When dynamically adding a volume to {{DataStorage}}, it should prepare the {{data dir}}, e.g., formatting if it is empty. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6740) Make FSDataset support adding data volumes dynamically
[ https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090035#comment-14090035 ] Hudson commented on HDFS-6740: -- FAILURE: Integrated in Hadoop-trunk-Commit #6033 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6033/]) HDFS-6740. Make FSDataset support adding data volumes dynamically. Contributed by Lei Xu. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616623) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StorageLocation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java Make FSDataset support adding data volumes dynamically -- Key: HDFS-6740 URL: https://issues.apache.org/jira/browse/HDFS-6740 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.6.0 Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to be able to add volumes dynamically during runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6835) Archival Storage: Add a new API to set storage policy
Tsz Wo Nicholas Sze created HDFS-6835: - Summary: Archival Storage: Add a new API to set storage policy Key: HDFS-6835 URL: https://issues.apache.org/jira/browse/HDFS-6835 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Jing Zhao The new data migration tool proposed HDFS-6801 will determine if the storage policy of files needs to be updated. The tool needs a new API to set storage policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090053#comment-14090053 ] Hadoop QA commented on HDFS-6772: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660464/HDFS-6772-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7585//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7585//console This message is automatically generated. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.
[ https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6774: Summary: Make FsDataset and DataStore support removing volumes. (was: Remove volumes from DataStorage) Make FsDataset and DataStore support removing volumes. -- Key: HDFS-6774 URL: https://issues.apache.org/jira/browse/HDFS-6774 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Managing volumes on DataNode includes decommissioning an active volume without restarting DataNode. This task adds support to remove volumes from {{DataStorage}} and {{BlockPoolSliceStorage}} dynamically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6772: -- Attachment: HDFS-6772-3.patch Thanks, Arpit. Here is the updated patch. The reason I renamed from stale storage to content stale stage so that it is distinguished from stale datanode; given the stale definitions are different. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.
[ https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6774: Attachment: HDFS-6774.000.patch This patch enables {{FsDataset}} and {{DataStorage}} to remove data volumes dynamically. The {{replicaInfos}} that are on the deleted volume will also be removed from {{FsDataset#volumeMap}}. The race condition that removing a volume that is being written is not addressed in this patch. I will open a new JIRA for that case and potential other race conditions. Make FsDataset and DataStore support removing volumes. -- Key: HDFS-6774 URL: https://issues.apache.org/jira/browse/HDFS-6774 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6774.000.patch Managing volumes on DataNode includes decommissioning an active volume without restarting DataNode. This task adds support to remove volumes from {{DataStorage}} and {{BlockPoolSliceStorage}} dynamically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.
[ https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6774: Status: Patch Available (was: Open) Make FsDataset and DataStore support removing volumes. -- Key: HDFS-6774 URL: https://issues.apache.org/jira/browse/HDFS-6774 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6774.000.patch Managing volumes on DataNode includes decommissioning an active volume without restarting DataNode. This task adds support to remove volumes from {{DataStorage}} and {{BlockPoolSliceStorage}} dynamically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090097#comment-14090097 ] Arpit Agarwal commented on HDFS-6772: - Thanks Ming, I think we can clarify that difference with documentation. {{numContentStaleStorages}} sounded kind of awkward. I am +1 for your latest patch, pending Jenkins. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6826: - Target Version/s: 2.6.0 Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6758) block writer should pass the expected block size to DataXceiverServer
[ https://issues.apache.org/jira/browse/HDFS-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6758: Issue Type: Improvement (was: Bug) block writer should pass the expected block size to DataXceiverServer - Key: HDFS-6758 URL: https://issues.apache.org/jira/browse/HDFS-6758 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client Affects Versions: 2.4.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6758.01.patch DataXceiver initializes the block size to the default block size for the cluster. This size is later used by the FsDatasetImpl when applying VolumeChoosingPolicy. {code} block.setNumBytes(dataXceiverServer.estimateBlockSize); {code} where {code} /** * We need an estimate for block size to check if the disk partition has * enough space. For now we set it to be the default block size set * in the server side configuration, which is not ideal because the * default block size should be a client-size configuration. * A better solution is to include in the header the estimated block size, * i.e. either the actual block size or the default block size. */ final long estimateBlockSize; {code} In most cases the writer can just pass the maximum expected block size to the DN instead of having to use the cluster default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6825) Edit log corruption due to delayed block removal
[ https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6825: Attachment: HDFS-6825.002.patch Upload patch 002 to address test failure. Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6830) BlockManager.addStorage fails when DN updates storage
[ https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090253#comment-14090253 ] Hadoop QA commented on HDFS-6830: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660489/HDFS-6830.03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7586//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7586//console This message is automatically generated. BlockManager.addStorage fails when DN updates storage - Key: HDFS-6830 URL: https://issues.apache.org/jira/browse/HDFS-6830 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, HDFS-6830.03.patch The call to {{removeStorageInfo}} is wrong because the block is still in the DatanodeStorage's list of blocks and the callee does not expect it to be. {code} } else { // The block is on the DN but belongs to a different storage. // Update our state. removeStorage(getStorageInfo(idx)); added = false; // Just updating storage. Return false. } {code} It is a very unlikely code path to hit since storage updates usually occur via incremental block reports. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6828) Separate block replica dispatching from Balancer
[ https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090274#comment-14090274 ] Hadoop QA commented on HDFS-6828: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660503/h6828_20140808b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7587//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7587//console This message is automatically generated. Separate block replica dispatching from Balancer Key: HDFS-6828 URL: https://issues.apache.org/jira/browse/HDFS-6828 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6828_20140808.patch, h6828_20140808b.patch The Balancer class implements two major features, (1) balancing logic for selecting replicas in order to balance the cluster and (2) block replica dispatching for moving the block replica around. This JIRA is to separate (2) from Balancer so that the code could be reused by other code such as the new data migration tool proposed in HDFS-6801. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6722) Display readable last contact time for dead nodes on NN webUI
[ https://issues.apache.org/jira/browse/HDFS-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6722: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~mingma] for the contribution. Display readable last contact time for dead nodes on NN webUI - Key: HDFS-6722 URL: https://issues.apache.org/jira/browse/HDFS-6722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.6.0 Attachments: HDFS-6722-2.patch, HDFS-6722-3.patch, HDFS-6722.patch For dead node info on NN webUI, admins want to know when the nodes became dead, to troubleshoot missing block, etc. Currently the webUI displays the last contact as the unit of seconds since the last contact. It will be useful to display the info in Date format. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6774) Make FsDataset and DataStore support removing volumes.
[ https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090315#comment-14090315 ] Hadoop QA commented on HDFS-6774: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660509/HDFS-6774.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.web.TestWebHDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7588//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7588//console This message is automatically generated. Make FsDataset and DataStore support removing volumes. -- Key: HDFS-6774 URL: https://issues.apache.org/jira/browse/HDFS-6774 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6774.000.patch Managing volumes on DataNode includes decommissioning an active volume without restarting DataNode. This task adds support to remove volumes from {{DataStorage}} and {{BlockPoolSliceStorage}} dynamically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090317#comment-14090317 ] Hadoop QA commented on HDFS-6772: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660508/HDFS-6772-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7589//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7589//console This message is automatically generated. Get DNs out of blockContentsStale==true state faster when NN restarts - Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6722) Display readable last contact time for dead nodes on NN webUI
[ https://issues.apache.org/jira/browse/HDFS-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090323#comment-14090323 ] Hudson commented on HDFS-6722: -- FAILURE: Integrated in Hadoop-trunk-Commit #6035 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6035/]) HDFS-6722. Display readable last contact time for dead nodes on NN webUI. Contributed by Ming Ma. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616669) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java Display readable last contact time for dead nodes on NN webUI - Key: HDFS-6722 URL: https://issues.apache.org/jira/browse/HDFS-6722 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.6.0 Attachments: HDFS-6722-2.patch, HDFS-6722-3.patch, HDFS-6722.patch For dead node info on NN webUI, admins want to know when the nodes became dead, to troubleshoot missing block, etc. Currently the webUI displays the last contact as the unit of seconds since the last contact. It will be useful to display the info in Date format. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6772) Get DN storages out of blockContentsStale state faster after NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6772: Summary: Get DN storages out of blockContentsStale state faster after NN restarts (was: Get DNs out of blockContentsStale==true state faster when NN restarts) Get DN storages out of blockContentsStale state faster after NN restarts Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6772) Get DN storages out of blockContentsStale state faster after NN restarts
[ https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6772: Resolution: Fixed Fix Version/s: 2.6.0 3.0.0 Target Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks for the contribution [~mingma]! Get DN storages out of blockContentsStale state faster after NN restarts Key: HDFS-6772 URL: https://issues.apache.org/jira/browse/HDFS-6772 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch Here is the non-HA scenario. 1. Get HDFS into block-over-replicated situation. 2. Restart the NN. 3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable. Why will DNs remain in blockContentsStale==true state for a long time? 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that. 2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true. {noformat} DatanodeStorageInfo.java void receivedBlockReport() { if (heartbeatedSinceFailover) { blockContentsStale = false; } blockReportCount++; } {noformat} 3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value. -- This message was sent by Atlassian JIRA (v6.2#6252)