[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088928#comment-14088928
 ] 

Arpit Agarwal commented on HDFS-6809:
-

+1 for the patch.

 Move some Balancer's inner classes to standalone classes
 

 Key: HDFS-6809
 URL: https://issues.apache.org/jira/browse/HDFS-6809
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6809_20140802.patch, h6809_20140806.patch


 Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can 
 be moved out as standalone classes so that these classes can be reused by 
 other code such as the new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6812:


Summary: Remove addBlock and replaceBlock from DatanodeDescriptor  (was: 
Reomve addBlock and replaceBlock from DatanodeDescriptor)

 Remove addBlock and replaceBlock from DatanodeDescriptor
 

 Key: HDFS-6812
 URL: https://issues.apache.org/jira/browse/HDFS-6812
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6812_20140803.patch


 DatanodeDescriptor.addBlock(..) is not used anymore.  
 DatanodeDescriptor.replaceBlock(..) is only used once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088933#comment-14088933
 ] 

Hadoop QA commented on HDFS-6781:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660306/HDFS-6781.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7576//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7576//console

This message is automatically generated.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.patch, HDFS-6781.patch, 
 HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088936#comment-14088936
 ] 

Arpit Agarwal commented on HDFS-6812:
-

Nice little simplification. I think we can also fix findDatanode to return 
boolean, but let me take care of that under HDFS-6830.

+1



 Remove addBlock and replaceBlock from DatanodeDescriptor
 

 Key: HDFS-6812
 URL: https://issues.apache.org/jira/browse/HDFS-6812
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6812_20140803.patch


 DatanodeDescriptor.addBlock(..) is not used anymore.  
 DatanodeDescriptor.replaceBlock(..) is only used once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6809) Move some Balancer's inner classes to standalone classes

2014-08-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6809:
--

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Arpit for reviewing the patches.

I have committed this.

 Move some Balancer's inner classes to standalone classes
 

 Key: HDFS-6809
 URL: https://issues.apache.org/jira/browse/HDFS-6809
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6809_20140802.patch, h6809_20140806.patch


 Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can 
 be moved out as standalone classes so that these classes can be reused by 
 other code such as the new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088966#comment-14088966
 ] 

Hadoop QA commented on HDFS-6506:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651956/HDFS-6506.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7577//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7577//console

This message is automatically generated.

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 

[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088972#comment-14088972
 ] 

Arpit Agarwal commented on HDFS-6781:
-

Hi Akira, thanks for your continued efforts to improve our documentation. It is 
much appreciated.

Nitpick suggestion if it makes sense to you: Instead of _HDFS Commands Manual_ 
perhaps we could call it _HDFS Commands Reference_ to be consistent with 
_Hadoop Commands Reference_.

I'd be okay with committing the patch either way.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.patch, HDFS-6781.patch, 
 HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088977#comment-14088977
 ] 

Hudson commented on HDFS-6812:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6027 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6027/])
HDFS-6812. Remove addBlock and replaceBlock from DatanodeDescriptor. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616426)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java


 Remove addBlock and replaceBlock from DatanodeDescriptor
 

 Key: HDFS-6812
 URL: https://issues.apache.org/jira/browse/HDFS-6812
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6812_20140803.patch


 DatanodeDescriptor.addBlock(..) is not used anymore.  
 DatanodeDescriptor.replaceBlock(..) is only used once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088976#comment-14088976
 ] 

Hudson commented on HDFS-6809:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6027 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6027/])
HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to 
standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java


 Move some Balancer's inner classes to standalone classes
 

 Key: HDFS-6809
 URL: https://issues.apache.org/jira/browse/HDFS-6809
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6809_20140802.patch, h6809_20140806.patch


 Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can 
 be moved out as standalone classes so that these classes can be reused by 
 other code such as the new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6781:


Attachment: HDFS-6781.2.patch

Thanks Arpit for the review.
Modified site.xml to use HDFS Commands Reference instead of HDFS Commands 
Manual.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.patch, HDFS-6781.2.patch, 
 HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6781:


Attachment: (was: HDFS-6781.2.patch)

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.patch, HDFS-6781.2.patch, 
 HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6781:


Attachment: HDFS-6781.2.patch

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.patch, HDFS-6781.2.patch, 
 HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-6781:


Attachment: HDFS-6781-branch-2.2.patch

Updated the patch for branch-2 also.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089031#comment-14089031
 ] 

Hadoop QA commented on HDFS-6781:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12660348/HDFS-6781-branch-2.2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7579//console

This message is automatically generated.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2014-08-07 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089060#comment-14089060
 ] 

Akira AJISAKA commented on HDFS-6682:
-

[~atm], would you please review this patch?

 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
 Attachments: HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6830) BlockManager.addStorage fails when DN updates storage

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089092#comment-14089092
 ] 

Hadoop QA commented on HDFS-6830:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660320/HDFS-6830.01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer
  org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfo

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7578//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7578//console

This message is automatically generated.

 BlockManager.addStorage fails when DN updates storage
 -

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6830.01.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor

2014-08-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6812:
--

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Arpit for reviewing the patch.

I have committed this.

 Remove addBlock and replaceBlock from DatanodeDescriptor
 

 Key: HDFS-6812
 URL: https://issues.apache.org/jira/browse/HDFS-6812
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6812_20140803.patch


 DatanodeDescriptor.addBlock(..) is not used anymore.  
 DatanodeDescriptor.replaceBlock(..) is only used once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6134) Transparent data at rest encryption

2014-08-07 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6134:
---

Attachment: HDFSDataatRestEncryption.pdf

I've attached a document that discusses the general design of this feature.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089124#comment-14089124
 ] 

Hadoop QA commented on HDFS-6134:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12660368/HDFSDataatRestEncryption.pdf
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7580//console

This message is automatically generated.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089138#comment-14089138
 ] 

Hudson commented on HDFS-6812:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #637 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/637/])
HDFS-6812. Remove addBlock and replaceBlock from DatanodeDescriptor. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616426)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java


 Remove addBlock and replaceBlock from DatanodeDescriptor
 

 Key: HDFS-6812
 URL: https://issues.apache.org/jira/browse/HDFS-6812
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6812_20140803.patch


 DatanodeDescriptor.addBlock(..) is not used anymore.  
 DatanodeDescriptor.replaceBlock(..) is only used once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089136#comment-14089136
 ] 

Hudson commented on HDFS-6809:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #637 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/637/])
HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to 
standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java


 Move some Balancer's inner classes to standalone classes
 

 Key: HDFS-6809
 URL: https://issues.apache.org/jira/browse/HDFS-6809
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6809_20140802.patch, h6809_20140806.patch


 Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can 
 be moved out as standalone classes so that these classes can be reused by 
 other code such as the new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

2014-08-07 Thread Shinichi Yamashita (JIRA)
Shinichi Yamashita created HDFS-6833:


 Summary: DirectoryScanner should not register a deleting block 
with memory of DataNode
 Key: HDFS-6833
 URL: https://issues.apache.org/jira/browse/HDFS-6833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita


When a block is deleted in DataNode, the following messages are usually output.

{code}
2014-08-07 17:53:11,606 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1073741825_1001 file 
/hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 for deletion
2014-08-07 17:53:11,617 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
/hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
{code}

However, DirectoryScanner may be executed when DataNode deletes the block in 
the current implementation. And the following messsages are output.

{code}
2014-08-07 17:53:30,519 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1073741825_1001 file 
/hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
 for deletion
2014-08-07 17:53:31,426 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
files:0, missing block files:0, missing blocks in memory:1, mismatched blocks:0
2014-08-07 17:53:31,426 WARN 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
  getNumBytes() = 21230663
  getBytesOnDisk()  = 21230663
  getVisibleLength()= 21230663
  getVolume()   = /hadoop/data1/dfs/data/current
  getBlockFile()= 
/hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
  unlinked  =false
2014-08-07 17:53:31,531 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
/hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
{code}

Deleting block information is registered in DataNode's memory.
And when DataNode sends a block report, NameNode receives wrong block 
information.

For example, when we execute recommission or change the number of replication, 
NameNode may delete the right block as ExcessReplicate by this problem.
And Under-Replicated Blocks and Missing Blocks occur.

When DataNode run DirectoryScanner, DataNode should not register a deleting 
block.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6517) Remove hadoop-metrics2.properties from hdfs project

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089226#comment-14089226
 ] 

Hudson commented on HDFS-6517:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/])
HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA 
via aw) (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616262)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/conf/hadoop-metrics2.properties
HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA 
via aw) (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616261)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove hadoop-metrics2.properties from hdfs project
 ---

 Key: HDFS-6517
 URL: https://issues.apache.org/jira/browse/HDFS-6517
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6517.patch


 HDFS-side of HADOOP-9919.
 HADOOP-9919 updated hadoop-metrics2.properties examples to YARN, however, the 
 examples are still old because hadoop-metrics2.properties in HDFS project is 
 actually packaged.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089222#comment-14089222
 ] 

Hudson commented on HDFS-6791:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/])
HDFS-6791. A block could remain under replicated if all of its replicas are on 
decommissioned nodes. Contributed by Ming Ma. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616306)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java


 A block could remain under replicated if all of its replicas are on 
 decommissioned nodes
 

 Key: HDFS-6791
 URL: https://issues.apache.org/jira/browse/HDFS-6791
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch


 Here is the scenario.
 1. Normally before NN transitions a DN to decommissioned state, enough 
 replicas have been copied to other in service DNs. However, in some rare 
 situations, the cluster got into a state where a DN is in decommissioned 
 state and a block's only replica is on that DN. In such state, the number of 
 replication reported by fsck is 1; the block just stays in under replicated 
 state; applications can still read the data, given decommissioned node can 
 served read traffic.
 This can happen in some error situations such DN failure or NN failover. For 
 example
 a) a block's only replica is node A temporarily.
 b) Start decommission process on node A.
 c) When node A is in decommission-in-progress state, node A crashed. NN 
 will mark node A as dead.
 d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
 2. In theory, NN should take care of under replicated blocks. But it doesn't 
 for this special case where the only replica is on decommissioned node. That 
 is because NN has the policy of decommissioned node can't be picked the 
 source node for replication.
 {noformat}
 BlockManager.java
 chooseSourceDatanode
   // never use already decommissioned nodes
   if(node.isDecommissioned())
 continue;
 {noformat}
 3. Given NN marks the node as decommissioned, admins will shutdown the 
 datanode. Under replicated blocks turn into missing blocks.
 4. The workaround is to recommission the node so that NN can start the 
 replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6812) Remove addBlock and replaceBlock from DatanodeDescriptor

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089225#comment-14089225
 ] 

Hudson commented on HDFS-6812:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/])
HDFS-6812. Remove addBlock and replaceBlock from DatanodeDescriptor. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616426)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptReplicaInfo.java


 Remove addBlock and replaceBlock from DatanodeDescriptor
 

 Key: HDFS-6812
 URL: https://issues.apache.org/jira/browse/HDFS-6812
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6812_20140803.patch


 DatanodeDescriptor.addBlock(..) is not used anymore.  
 DatanodeDescriptor.replaceBlock(..) is only used once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089223#comment-14089223
 ] 

Hudson commented on HDFS-6809:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1830 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1830/])
HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to 
standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java


 Move some Balancer's inner classes to standalone classes
 

 Key: HDFS-6809
 URL: https://issues.apache.org/jira/browse/HDFS-6809
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6809_20140802.patch, h6809_20140806.patch


 Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can 
 be moved out as standalone classes so that these classes can be reused by 
 other code such as the new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089247#comment-14089247
 ] 

Daryn Sharp commented on HDFS-6776:
---

Sorry, but this patch is completely wrong.
# If security is enabled and an {{IOException}} happens for any reason - 
transient or legit - while acquiring a token, the client will continue to 
work because of spnego but if a job is submitted the tasks will all fail due 
to no token.
# Webhdfs should be using the same insecure fallback policy as RPC.
# Insecure RPC services return null if a token is requested.  Like DFSClient, 
the webhdfs client should be able to handle that condition instead of throwing 
the exception you see.
# Issuing a malformed OPEN call is not ok...
# Although irrelevant in like of the above, connection.connect() isn't doing 
what you think.  It proved the client could open a connection and send the 
request.  It doesn't prove the server allowed/authenticated the request.  The 
you read the response, the server should have been angry you issued an invalid 
open.

 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 

[jira] [Commented] (HDFS-6809) Move some Balancer's inner classes to standalone classes

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089302#comment-14089302
 ] 

Hudson commented on HDFS-6809:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1856 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1856/])
HDFS-6809. Move Balancer's inner classes MovedBlocks and Matcher as to 
standalone classes and separates KeyManager from NameNodeConnector. (szetszwo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616422)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Matcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java


 Move some Balancer's inner classes to standalone classes
 

 Key: HDFS-6809
 URL: https://issues.apache.org/jira/browse/HDFS-6809
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.6.0

 Attachments: h6809_20140802.patch, h6809_20140806.patch


 Some of the inner classes in Balancer such as MovedBlocks, Matcher, etc. can 
 be moved out as standalone classes so that these classes can be reused by 
 other code such as the new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089301#comment-14089301
 ] 

Hudson commented on HDFS-6791:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1856 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1856/])
HDFS-6791. A block could remain under replicated if all of its replicas are on 
decommissioned nodes. Contributed by Ming Ma. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616306)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java


 A block could remain under replicated if all of its replicas are on 
 decommissioned nodes
 

 Key: HDFS-6791
 URL: https://issues.apache.org/jira/browse/HDFS-6791
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch


 Here is the scenario.
 1. Normally before NN transitions a DN to decommissioned state, enough 
 replicas have been copied to other in service DNs. However, in some rare 
 situations, the cluster got into a state where a DN is in decommissioned 
 state and a block's only replica is on that DN. In such state, the number of 
 replication reported by fsck is 1; the block just stays in under replicated 
 state; applications can still read the data, given decommissioned node can 
 served read traffic.
 This can happen in some error situations such DN failure or NN failover. For 
 example
 a) a block's only replica is node A temporarily.
 b) Start decommission process on node A.
 c) When node A is in decommission-in-progress state, node A crashed. NN 
 will mark node A as dead.
 d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
 2. In theory, NN should take care of under replicated blocks. But it doesn't 
 for this special case where the only replica is on decommissioned node. That 
 is because NN has the policy of decommissioned node can't be picked the 
 source node for replication.
 {noformat}
 BlockManager.java
 chooseSourceDatanode
   // never use already decommissioned nodes
   if(node.isDecommissioned())
 continue;
 {noformat}
 3. Given NN marks the node as decommissioned, admins will shutdown the 
 datanode. Under replicated blocks turn into missing blocks.
 4. The workaround is to recommission the node so that NN can start the 
 replication from the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6517) Remove hadoop-metrics2.properties from hdfs project

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089305#comment-14089305
 ] 

Hudson commented on HDFS-6517:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1856 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1856/])
HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA 
via aw) (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616262)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/conf/hadoop-metrics2.properties
HDFS-6517. Remove hadoop-metrics2.properties from hdfs project (Akira AJISAKA 
via aw) (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616261)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove hadoop-metrics2.properties from hdfs project
 ---

 Key: HDFS-6517
 URL: https://issues.apache.org/jira/browse/HDFS-6517
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6517.patch


 HDFS-side of HADOOP-9919.
 HADOOP-9919 updated hadoop-metrics2.properties examples to YARN, however, the 
 examples are still old because hadoop-metrics2.properties in HDFS project is 
 actually packaged.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work

2014-08-07 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089372#comment-14089372
 ] 

Yongjun Zhang commented on HDFS-6776:
-

HI [~daryn], thank you so much for the very helpful comments. I will look into 
addressing them in next revision.


 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 

[jira] [Commented] (HDFS-6782) Improve FS editlog logSync

2014-08-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089402#comment-14089402
 ] 

Daryn Sharp commented on HDFS-6782:
---

Edit logging is pretty tricky.  I need to think about it more.  It seems like 
if {{syncStart}} is an instance member instead of block scoped, this simple 
condition might work as the last line of {{logEdit}}: {{if (mytxid  syncStart) 
logSync()}}

 Improve FS editlog logSync
 --

 Key: HDFS-6782
 URL: https://issues.apache.org/jira/browse/HDFS-6782
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-6782.001.patch, HDFS-6782.002.patch


 In NN, it uses a double buffer (bufCurrent, bufReady) for log sync, 
 bufCurrent it to buffer new coming edit ops and bufReady is for flushing. 
 This's efficient. When flush is ongoing, and bufCurrent is full, NN goes to 
 force log sync, and all new Ops are blocked (since force log sync is 
 protected by FSNameSystem write lock). After the flush finished, the new Ops 
 are still blocked, but actually at this time, bufCurrent is free and Ops can 
 go ahead and write to the buffer. The following diagram shows the detail. 
 This JIRA is for this improvement.  Thanks [~umamaheswararao] for confirming 
 this issue.
 {code}
 edit1(txid1) -- write to bufCurrent  logSync - (swap 
 buffer)flushing ---
 edit2(txid2) -- write to bufCurrent  logSync - waiting 
 ---
 edit3(txid3) -- write to bufCurrent  logSync - waiting 
 ---
 edit4(txid4) -- write to bufCurrent  logSync - waiting 
 ---
 edit5(txid5) -- write to bufCurrent --full-- force sync - waiting 
 ---
 edit6(txid6) -- blocked
 ...
 editn(txidn) -- blocked
 {code}
 After the flush, it becomes
 {code}
 edit1(txid1) -- write to bufCurrent  logSync - finished 
 
 edit2(txid2) -- write to bufCurrent  logSync - flushing 
 ---
 edit3(txid3) -- write to bufCurrent  logSync - waiting 
 ---
 edit4(txid4) -- write to bufCurrent  logSync - waiting 
 ---
 edit5(txid5) -- write to bufCurrent --full-- force sync - waiting 
 ---
 edit6(txid6) -- blocked
 ...
 editn(txidn) -- blocked
 {code}
 After edit1 finished, bufCurrent is free, and the thread which flushes txid2 
 will also flushes txid3-5, so we should return from the force sync of edit5 
 and FSNamesystem write lock will be freed (Don't worry that edit5 Op will 
 return, since there will be a normal logSync after the force logSync and 
 there will wait for sync finished). This is the idea of this JIRA. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6781:


Attachment: HDFS-6781.3.patch

Resubmitting the trunk patch with a different name for Jenkins.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089410#comment-14089410
 ] 

Arpit Agarwal commented on HDFS-6781:
-

+1 pending Jenkins.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6828) Separate block replica dispatching from Balancer

2014-08-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6828:
--

Status: Patch Available  (was: Open)

 Separate block replica dispatching from Balancer
 

 Key: HDFS-6828
 URL: https://issues.apache.org/jira/browse/HDFS-6828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6828_20140808.patch


 The Balancer class implements two major features, (1) balancing logic for 
 selecting replicas in order to balance the cluster and (2) block replica 
 dispatching for moving the block replica around.  This JIRA is to separate 
 (2) from Balancer so that the code could be reused by other code such as the 
 new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6828) Separate block replica dispatching from Balancer

2014-08-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6828:
--

Attachment: h6828_20140808.patch

h6828_20140808.patch: separates Dispatcher from Balancer.

 Separate block replica dispatching from Balancer
 

 Key: HDFS-6828
 URL: https://issues.apache.org/jira/browse/HDFS-6828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6828_20140808.patch


 The Balancer class implements two major features, (1) balancing logic for 
 selecting replicas in order to balance the cluster and (2) block replica 
 dispatching for moving the block replica around.  This JIRA is to separate 
 (2) from Balancer so that the code could be reused by other code such as the 
 new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6831) Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'

2014-08-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089505#comment-14089505
 ] 

Jing Zhao commented on HDFS-6831:
-

Also in dfsadmin -help there are only a couple of commands' help information 
mention that it requires superuser permissions. Maybe we can move this phrase 
into the help summary.

 Inconsistency between 'hdfs dfsadmin' and 'hdfs dfsadmin -help'
 ---

 Key: HDFS-6831
 URL: https://issues.apache.org/jira/browse/HDFS-6831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Priority: Minor
  Labels: newbie

 There is an inconsistency between the console outputs of 'hdfs dfsadmin' 
 command and 'hdfs dfsadmin -help' command.
 {code}
 [root@trunk ~]# hdfs dfsadmin
 Usage: java DFSAdmin
 Note: Administrative commands can only be run as the HDFS superuser.
[-report]
[-safemode enter | leave | get | wait]
[-allowSnapshot snapshotDir]
[-disallowSnapshot snapshotDir]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-finalizeUpgrade]
[-rollingUpgrade [query|prepare|finalize]]
[-metasave filename]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshCallQueue]
[-refresh]
[-printTopology]
[-refreshNamenodes datanodehost:port]
[-deleteBlockPool datanode-host:port blockpoolId [force]]
[-setQuota quota dirname...dirname]
[-clrQuota dirname...dirname]
[-setSpaceQuota quota dirname...dirname]
[-clrSpaceQuota dirname...dirname]
[-setBalancerBandwidth bandwidth in bytes per second]
[-fetchImage local directory]
[-shutdownDatanode datanode_host:ipc_port [upgrade]]
[-getDatanodeInfo datanode_host:ipc_port]
[-help [cmd]]
 {code}
 {code}
 [root@trunk ~]# hdfs dfsadmin -help
 hadoop dfsadmin performs DFS administrative commands.
 The full syntax is: 
 hadoop dfsadmin
   [-report [-live] [-dead] [-decommissioning]]
   [-safemode enter | leave | get | wait]
   [-saveNamespace]
   [-rollEdits]
   [-restoreFailedStorage true|false|check]
   [-refreshNodes]
   [-setQuota quota dirname...dirname]
   [-clrQuota dirname...dirname]
   [-setSpaceQuota quota dirname...dirname]
   [-clrSpaceQuota dirname...dirname]
   [-finalizeUpgrade]
   [-rollingUpgrade [query|prepare|finalize]]
   [-refreshServiceAcl]
   [-refreshUserToGroupsMappings]
   [-refreshSuperUserGroupsConfiguration]
   [-refreshCallQueue]
   [-refresh host:ipc_port key [arg1..argn]
   [-printTopology]
   [-refreshNamenodes datanodehost:port]
   [-deleteBlockPool datanodehost:port blockpoolId [force]]
   [-setBalancerBandwidth bandwidth]
   [-fetchImage local directory]
   [-allowSnapshot snapshotDir]
   [-disallowSnapshot snapshotDir]
   [-shutdownDatanode datanode_host:ipc_port [upgrade]]
   [-getDatanodeInfo datanode_host:ipc_port
   [-help [cmd]
 {code}
 These two outputs should be the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6834) Improve the configuration guidance in DFSClient when there are no Codec classes found in configs

2014-08-07 Thread Uma Maheswara Rao G (JIRA)
Uma Maheswara Rao G created HDFS-6834:
-

 Summary: Improve the configuration guidance in DFSClient when 
there are no Codec classes found in configs
 Key: HDFS-6834
 URL: https://issues.apache.org/jira/browse/HDFS-6834
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor


This is the comment in HADOOP-10886 from Andrew. 
 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6834) Improve the configuration guidance in DFSClient when there are no Codec classes found in configs

2014-08-07 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-6834:
--

Attachment: HDFS-6834.patch

Attached simple patch handle.

 Improve the configuration guidance in DFSClient when there are no Codec 
 classes found in configs
 

 Key: HDFS-6834
 URL: https://issues.apache.org/jira/browse/HDFS-6834
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-6834.patch


 This is the comment in HADOOP-10886 from Andrew. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6830) BlockManager.addStorage fails when DN updates storage

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6830:


Attachment: HDFS-6830.02.patch

Rebase patch after HDFS-6812 commit.

 BlockManager.addStorage fails when DN updates storage
 -

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-08-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089546#comment-14089546
 ] 

stack commented on HDFS-6803:
-

[~cmccabe] Thanks for the very nice feedback.  Let me incorporate/fix what has 
been posted.

[~stev...@iseran.com] Thanks for jumping on boss.

bq. Consistency with actual file data  metadata

Yes. Lets fold in your text. You are describing the FS as it is today.

bq. ...The second read() would succeed/return -1 depending on the position

Do you mean the third read in above?

bq. When a pread is in progress, should that change be visible in getPos()?

I like [~cmccabe]'s groupings which implies pread does not change getPos.

How should I proceed with this issue [~ste...@apache.org]? I'd like to get 2.1 
and 2.2 (from attached doc) blessed as scripture.  Seems like a bit of cleanup 
is all that is needed in HDFS (and as [~hitliuyi] suggests, we could probably 
remove some synchronizes). My guess is the other FS implementations have not 
been implemented the way HDFS has been and that backfilling a pread to run 
independent of a read would be a bunch of work.  Would this work be a blocker 
on adding 2.1/2.2 to the spec and HDFS?  Thanks Steve.





 Documenting DFSClient#DFSInputStream expectations reading and preading in 
 concurrent context
 

 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: DocumentingDFSClientDFSInputStream (1).pdf


 Reviews of the patch posted the parent task suggest that we be more explicit 
 about how DFSIS is expected to behave when being read by contending threads. 
 It is also suggested that presumptions made internally be made explicit 
 documenting expectations.
 Before we put up a patch we've made a document of assertions we'd like to 
 make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
 a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089597#comment-14089597
 ] 

Arpit Agarwal commented on HDFS-6772:
-

[~mingma], I was unsure whether this delta can result in lost commands, since 
it will cause the caller {{processCommands}} to discard any subsequent commands.

{code}
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
@@ -531,7 +531,7 @@ boolean processCommandFromActor(DatanodeCommand cmd,
   LOG.info(DatanodeCommand action : DNA_REGISTER from  + actor.nnAddr
   +  with  + actor.state +  state);
   actor.reRegister();
-  return true;
+  return false;
{code}

On further investigation it works because RegisterCommand is sent by itself. 
Could you please add a comment to {{RegisterCommand}} stating it must not be 
combined with other commands in the same response?

Thanks for adding a test case. I think you can remove this comment _Connection 
to NN times due to NN restart._ A timeout is not needed for the test case to 
work. The NN will always ask the DN to re-register after restart.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6821) Atomicity of multi file operations

2014-08-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-6821.
-

Resolution: Won't Fix

Hi, [~samera].

Ideas similar to this have been proposed several times.  The consensus has 
always been that pushing a recursive operation all the way to the NameNode for 
atomicity would impact throughput too severely.  The implementation would 
require holding the write lock while updating every inode in a subtree.  During 
that time, all other RPC caller threads would block waiting for release of the 
write lock.  A finer-grained locking implementation would help mitigate this, 
but it wouldn't eliminate the problem completely.

It's typical behavior in many file systems that recursive operations are driven 
from user space, and the syscalls modify a single inode at a time.  HDFS isn't 
different in this respect.

I'm going to resolve this as won't fix.

 Atomicity of multi file operations
 --

 Key: HDFS-6821
 URL: https://issues.apache.org/jira/browse/HDFS-6821
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Samer Al-Kiswany
Priority: Minor

 Looking how HDFS updates the log files in case of chmod –r or chown –r 
 operations. In these operations, HDFS name node seems to update each file 
 separately; consequently the strace of the operation looks as follows.
 append(edits)
 fsync(edits)
 append(edits)
 fsync(edits)
 ---
 append(edits)
 fsync(edits)
 append(edits)
 fsync(edits)
 If a crash happens in the middle of this operation (e.g. at the dashed line 
 in the trace), the system will end up with part of the files updates with the 
 new owner or permissions and part still with the old owner.
 Isn’t it better to log the whole operations (chown -r) as one entry in the 
 edit file?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089650#comment-14089650
 ] 

Arpit Agarwal commented on HDFS-6772:
-

Also the test case should be wrapped in try.. finally so it can shutdown the 
cluster if the test fails midway.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089653#comment-14089653
 ] 

Arpit Agarwal commented on HDFS-6425:
-

Hi Ming, is this problem mitigated by your fix for HDFS-6772?

 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089670#comment-14089670
 ] 

Hadoop QA commented on HDFS-6781:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660391/HDFS-6781.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7581//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7581//console

This message is automatically generated.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6825:


Attachment: HDFS-6825.001.patch

 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6821) Atomicity of multi file operations

2014-08-07 Thread Samer Al-Kiswany (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089671#comment-14089671
 ] 

Samer Al-Kiswany commented on HDFS-6821:


I see.
Thanks Chris.

-samer

 Atomicity of multi file operations
 --

 Key: HDFS-6821
 URL: https://issues.apache.org/jira/browse/HDFS-6821
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Samer Al-Kiswany
Priority: Minor

 Looking how HDFS updates the log files in case of chmod –r or chown –r 
 operations. In these operations, HDFS name node seems to update each file 
 separately; consequently the strace of the operation looks as follows.
 append(edits)
 fsync(edits)
 append(edits)
 fsync(edits)
 ---
 append(edits)
 fsync(edits)
 append(edits)
 fsync(edits)
 If a crash happens in the middle of this operation (e.g. at the dashed line 
 in the trace), the system will end up with part of the files updates with the 
 new owner or permissions and part still with the old owner.
 Isn’t it better to log the whole operations (chown -r) as one entry in the 
 edit file?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6781:


Issue Type: Improvement  (was: Bug)

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6825:


Status: Patch Available  (was: Open)

Submit patch 001 to address the issue. 

Two additional issues were found and fixed:
1. When snapshot for a file doesn't exist, 
FSNamespace.commitBlockSynchronization would thrown NPE,  because the 
blockCollection of the storedBlock was set to null by a delete operation. 
2. BlockInfoUnderConstruction.appendUCParts doesn't check whether replicas is 
null or not

Thanks for reviewing.


 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6828) Separate block replica dispatching from Balancer

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089690#comment-14089690
 ] 

Hadoop QA commented on HDFS-6828:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660398/h6828_20140808.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7582//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7582//console

This message is automatically generated.

 Separate block replica dispatching from Balancer
 

 Key: HDFS-6828
 URL: https://issues.apache.org/jira/browse/HDFS-6828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6828_20140808.patch


 The Balancer class implements two major features, (1) balancing logic for 
 selecting replicas in order to balance the cluster and (2) block replica 
 dispatching for moving the block replica around.  This JIRA is to separate 
 (2) from Balancer so that the code could be reused by other code such as the 
 new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6781:


  Resolution: Fixed
   Fix Version/s: 2.6.0
  3.0.0
Target Version/s: 2.6.0  (was: 3.0.0, 2.6.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6781) Separate HDFS commands from CommandsManual.apt.vm

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089705#comment-14089705
 ] 

Hudson commented on HDFS-6781:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6030 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6030/])
HDFS-6781. Separate HDFS commands from CommandsManual.apt.vm. (Contributed by 
Akira Ajisaka) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616575)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
* /hadoop/common/trunk/hadoop-project/src/site/site.xml


 Separate HDFS commands from CommandsManual.apt.vm
 -

 Key: HDFS-6781
 URL: https://issues.apache.org/jira/browse/HDFS-6781
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6781-branch-2.2.patch, HDFS-6781-branch-2.patch, 
 HDFS-6781.2.patch, HDFS-6781.3.patch, HDFS-6781.patch, HDFS-6781.patch


 HDFS-side of HADOOP-10899.
 The CommandsManual lists very old information about running HDFS subcommands 
 from the 'hadoop' shell CLI. These are deprecated and should be removed. If 
 necessary, the HDFS subcommands should be added to the HDFS documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6834) Improve the configuration guidance in DFSClient when there are no Codec classes found in configs

2014-08-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089758#comment-14089758
 ] 

Andrew Wang commented on HDFS-6834:
---

+1 thanks Uma

 Improve the configuration guidance in DFSClient when there are no Codec 
 classes found in configs
 

 Key: HDFS-6834
 URL: https://issues.apache.org/jira/browse/HDFS-6834
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-6834.patch


 This is the comment in HADOOP-10886 from Andrew. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6766) optimize ack notify mechanism to avoid thundering herd issue

2014-08-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089828#comment-14089828
 ] 

stack commented on HDFS-6766:
-

Ran tests with 10 concurrent writers to a HDFS file hosted on a small (5-node) 
cluster, then with 20 concurrent writers and finally with 100 concurrent 
writing threads. Each thread wrote 100k times. Tests lasted 2-3 minutes 
dependent on thread count. Below are context switch counts as reported by linux 
perf.

||Threads||No patch||With patch||Difference||%||
|10|7,855,181|7,055,688|-799,493|-10.17790679|
|10|7,849,103|7,099,845|-749,258|-9.545778671|
|10|7,592,115|7,183,892|-408,223|-5.376933832|
|20|9,107,196|8,168,499|-938,697|-10.30720103|
|20|8,983,253|8,164,469|-818,784|-9.114560171|
|20|9,192,111|8,149,535|-1,042,576|-11.34207365|
|100|18,503,931|17,013,636|-1,490,295|-8.053937296|
|100|18,553,534|17,051,602|-1,501,932|-8.095126244|
|100|18,691,605|17,058,533|-1,633,072|-8.736927621|

Here is what I ran to test (from hbase trunk -- writes WAL, a sequence file):

{code}for i in 10 20 100; do for j in 1 2 3; do perf stat 
${HOME}/hbase-2.0.0-SNAPSHOT/bin/hbase --config $HOME/conf_hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads $i 
-iterati
ons 10 -keySize 50 -valueSize 100  /tmp/$1.${i}.${j}.txt; done; 
done{code}

 optimize ack notify mechanism to avoid thundering herd issue
 

 Key: HDFS-6766
 URL: https://issues.apache.org/jira/browse/HDFS-6766
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6766.txt


 Currently, DFSOutputStream uses wait/notifyAll to coordinate ack receiving 
 and ack waiting, etc..
 say there're 5 threads(t1,t2,t3,t4,t5) wait for ack seq no: 1,2,3,4,5, once 
 the no. 1 ack arrived, the notifyAll be called, so t2/t3/t4/t5 could do 
 nothing except wait again.
 we can rewrite it with Condition class, with a fair policy(fifo), we can just 
 make t1 be notified, so a number of context switch be saved.
 It's possible more than one thread waiting on the same ack seq no(e.g. no 
 more data be written between two flush operations), so once it happened, we 
 need to notify those threads, so i introduced a set to remember this seq no.
 In a simple HBase ycsb testing, the context switch number per second was 
 reduced about 15%, and reduced sys cpu% about 6%(My HBase has new write model 
 patch, i think the benefit will be higher if w/o it)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6766) optimize ack notify mechanism to avoid thundering herd issue

2014-08-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089847#comment-14089847
 ] 

stack commented on HDFS-6766:
-

5-10% improvement in context switches.

That said, tests with patch takes longer to complete and tjroughput is less 
across the board.

Here is output for the last two 100 thread runs:

NoPatch:

{code}
 561652.888388 task-clock#3.519 CPUs utilized
18,691,605 context-switches  #0.033 M/sec
 4,612,298 CPU-migrations#0.008 M/sec
   578,966 page-faults   #0.001 M/sec
   847,613,184,509 cycles#1.509 GHz 
[83.30%]
   643,171,621,294 stalled-cycles-frontend   #   75.88% frontend cycles idle
[83.32%]
   378,342,727,102 stalled-cycles-backend#   44.64% backend  cycles idle
[66.68%]
   404,794,735,743 instructions  #0.48  insns per cycle
 #1.59  stalled cycles per insn 
[83.39%]
70,996,040,867 branches  #  126.406 M/sec   
[83.36%]
 2,599,494,946 branch-misses #3.66% of all branches 
[83.34%]

 159.595619057 seconds time elapsed
{code}

WithPatch
{code}
 555117.674087 task-clock#3.248 CPUs utilized
17,058,533 context-switches  #0.031 M/sec
 3,928,780 CPU-migrations#0.007 M/sec
   576,656 page-faults   #0.001 M/sec
   839,218,544,729 cycles#1.512 GHz 
[83.32%]
   641,419,880,735 stalled-cycles-frontend   #   76.43% frontend cycles idle
[83.37%]
   386,633,844,790 stalled-cycles-backend#   46.07% backend  cycles idle
[66.71%]
   391,833,659,097 instructions  #0.47  insns per cycle
 #1.64  stalled cycles per insn 
[83.34%]
68,406,883,351 branches  #  123.230 M/sec   
[83.25%]
 2,674,118,142 branch-misses #3.91% of all branches 
[83.36%]

 170.934250947 seconds time elapsed
{code}

All numbers are better w/ patch except instructions per cycle.

 optimize ack notify mechanism to avoid thundering herd issue
 

 Key: HDFS-6766
 URL: https://issues.apache.org/jira/browse/HDFS-6766
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6766.txt


 Currently, DFSOutputStream uses wait/notifyAll to coordinate ack receiving 
 and ack waiting, etc..
 say there're 5 threads(t1,t2,t3,t4,t5) wait for ack seq no: 1,2,3,4,5, once 
 the no. 1 ack arrived, the notifyAll be called, so t2/t3/t4/t5 could do 
 nothing except wait again.
 we can rewrite it with Condition class, with a fair policy(fifo), we can just 
 make t1 be notified, so a number of context switch be saved.
 It's possible more than one thread waiting on the same ack seq no(e.g. no 
 more data be written between two flush operations), so once it happened, we 
 need to notify those threads, so i introduced a set to remember this seq no.
 In a simple HBase ycsb testing, the context switch number per second was 
 reduced about 15%, and reduced sys cpu% about 6%(My HBase has new write model 
 patch, i think the benefit will be higher if w/o it)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6772:
--

Attachment: HDFS-6772-2.patch

Thanks, Arpit. Here is the updated patch to address the issues you raised.

I also changed when content stale storages metrics is calculated. Instead of 
on demand when the metrics is pull via JMX, it is calculated in the background 
by HeartbeatManager. It should be is ok given the freshness requirement of this 
metrics.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6830) BlockManager.addStorage fails when DN updates storage

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089903#comment-14089903
 ] 

Hadoop QA commented on HDFS-6830:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660426/HDFS-6830.02.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.blockmanagement.TestBlockInfo
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7583//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7583//console

This message is automatically generated.

 BlockManager.addStorage fails when DN updates storage
 -

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6830) BlockManager.addStorage fails when DN updates storage

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6830:


Attachment: HDFS-6830.03.patch

Fix test case.

 BlockManager.addStorage fails when DN updates storage
 -

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, 
 HDFS-6830.03.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency

2014-08-07 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089967#comment-14089967
 ] 

Ming Ma commented on HDFS-6425:
---

Thanks, Arpit. This jira can address more common NN failover scenario with lots 
of content stale storages.

We try to get storages out of content stale  as soon as possible. Here are 
several scenarios.

a. For non-HA NN restart, have DN send HB before BR right after registration.
b. For HA setup, NN becomes active right after it restarts. This can happen if 
we have to restart both NNs at the same time, due to some rare outage or some 
incompatible upgrade. In this case, the active NN will first go to standby, 
then get transitioned to active at which point all DNs will be marked as stale 
again. For big clusters, most of the DN reregistration will come in after the 
NN becomes active, so the fix to have DNs send HB and BR right after 
registration will also help.
c. For HA setup, NN becomes active after the NN JVM has been up for some time. 
The failover could happen due to zk session timeout, or the other NN just 
crashes. In this case, there is no DN reregistration given the new active NN 
doesn't have recent restart. We can change the NN to ask DN to resend 
blockreport upon failover, but that will cause cluster performance issue.

So we still have some scenario where we might have lots of content stale 
storages. This jira tries to make NN handle the scenario better.


 Large postponedMisreplicatedBlocks has impact on blockReport latency
 

 Key: HDFS-6425
 URL: https://issues.apache.org/jira/browse/HDFS-6425
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6425-Test-Case.pdf, HDFS-6425.patch


 Sometimes we have large number of over replicates when NN fails over. When 
 the new active NN took over, over replicated blocks will be put to 
 postponedMisreplicatedBlocks until all DNs for that block aren't stale 
 anymore.
 We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
 became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
 just kept increasing until the cluster is stable. 
 In addition, large postponedMisreplicatedBlocks could make 
 rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
 takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089969#comment-14089969
 ] 

Hadoop QA commented on HDFS-6825:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660440/HDFS-6825.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.TestCommitBlockSynchronization

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7584//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7584//console

This message is automatically generated.

 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.

2014-08-07 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089980#comment-14089980
 ] 

Aaron T. Myers commented on HDFS-6728:
--

Thanks, Eddy. I agree that the test failure is unrelated.

+1, I'm going to commit this momentarily.

 Dynamically add new volumes to DataStorage, formatted if necessary.
 ---

 Key: HDFS-6728
 URL: https://issues.apache.org/jira/browse/HDFS-6728
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
  Labels: datanode
 Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, 
 HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch


 When dynamically adding a volume to {{DataStorage}}, it should prepare the 
 {{data dir}}, e.g., formatting if it is empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089983#comment-14089983
 ] 

Arpit Agarwal commented on HDFS-6772:
-

Thanks Ming. The patch looks good. Following minor issues and I apologize for 
not mentioning #2 and #3 last time.
# Local variable {{numOfStorages}} in {{heartbeatCheck}} is unused. It should 
be removed.
# In {{scheduleHeartbeat}}, we can just set {{lastHeartbeat = 0}}. It's easier 
to follow and has the same effect.
# Could we rename {{numContentStaleStorages}} to {{numStaleStorages}} to be 
consistent with {{numStaleNodes}}?

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.

2014-08-07 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089986#comment-14089986
 ] 

Lei (Eddy) Xu commented on HDFS-6728:
-

Great! Thank you for the reviews, [~atm]. 

 Dynamically add new volumes to DataStorage, formatted if necessary.
 ---

 Key: HDFS-6728
 URL: https://issues.apache.org/jira/browse/HDFS-6728
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
  Labels: datanode
 Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, 
 HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch


 When dynamically adding a volume to {{DataStorage}}, it should prepare the 
 {{data dir}}, e.g., formatting if it is empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.

2014-08-07 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6728:
-

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Eddy.

 Dynamically add new volumes to DataStorage, formatted if necessary.
 ---

 Key: HDFS-6728
 URL: https://issues.apache.org/jira/browse/HDFS-6728
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
  Labels: datanode
 Fix For: 2.6.0

 Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, 
 HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch


 When dynamically adding a volume to {{DataStorage}}, it should prepare the 
 {{data dir}}, e.g., formatting if it is empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6740) FSDataset adds data volumes dynamically

2014-08-07 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089998#comment-14089998
 ] 

Aaron T. Myers commented on HDFS-6740:
--

Pretty confident that the test failure is unrelated - it's been failing in 
other builds as well, and I can't reproduce it on my box.

+1, I'm going to commit this momentarily. 

 FSDataset adds data volumes dynamically
 ---

 Key: HDFS-6740
 URL: https://issues.apache.org/jira/browse/HDFS-6740
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch


 To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to 
 be able to add volumes dynamically during runtime. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6740) Make FSDataset support adding data volumes dynamically

2014-08-07 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6740:
-

Summary: Make FSDataset support adding data volumes dynamically  (was: 
FSDataset adds data volumes dynamically)

 Make FSDataset support adding data volumes dynamically
 --

 Key: HDFS-6740
 URL: https://issues.apache.org/jira/browse/HDFS-6740
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch


 To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to 
 be able to add volumes dynamically during runtime. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6740) Make FSDataset support adding data volumes dynamically

2014-08-07 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6740:
-

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Eddy.

 Make FSDataset support adding data volumes dynamically
 --

 Key: HDFS-6740
 URL: https://issues.apache.org/jira/browse/HDFS-6740
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.6.0

 Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch


 To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to 
 be able to add volumes dynamically during runtime. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6828) Separate block replica dispatching from Balancer

2014-08-07 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6828:
--

Attachment: h6828_20140808b.patch

h6828_20140808b.patch: fixes some bugs.

 Separate block replica dispatching from Balancer
 

 Key: HDFS-6828
 URL: https://issues.apache.org/jira/browse/HDFS-6828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6828_20140808.patch, h6828_20140808b.patch


 The Balancer class implements two major features, (1) balancing logic for 
 selecting replicas in order to balance the cluster and (2) block replica 
 dispatching for moving the block replica around.  This JIRA is to separate 
 (2) from Balancer so that the code could be reused by other code such as the 
 new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6728) Dynamically add new volumes to DataStorage, formatted if necessary.

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090033#comment-14090033
 ] 

Hudson commented on HDFS-6728:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6033 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6033/])
HDFS-6728. Dynamically add new volumes to DataStorage, formatted if necessary. 
Contributed by Lei Xu. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616615)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataStorage.java


 Dynamically add new volumes to DataStorage, formatted if necessary.
 ---

 Key: HDFS-6728
 URL: https://issues.apache.org/jira/browse/HDFS-6728
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
  Labels: datanode
 Fix For: 2.6.0

 Attachments: HDFS-6728.000.patch, HDFS-6728.000.patch, 
 HDFS-6728.001.patch, HDFS-6728.002.patch, HDFS-6728.004.patch


 When dynamically adding a volume to {{DataStorage}}, it should prepare the 
 {{data dir}}, e.g., formatting if it is empty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6740) Make FSDataset support adding data volumes dynamically

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090035#comment-14090035
 ] 

Hudson commented on HDFS-6740:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6033 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6033/])
HDFS-6740. Make FSDataset support adding data volumes dynamically. Contributed 
by Lei Xu. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616623)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StorageLocation.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java


 Make FSDataset support adding data volumes dynamically
 --

 Key: HDFS-6740
 URL: https://issues.apache.org/jira/browse/HDFS-6740
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.6.0

 Attachments: HDFS-6740.000.patch, HDFS-6740.001.patch


 To support volume management in DN (HDFS-1362), it requires FSDatasetImpl to 
 be able to add volumes dynamically during runtime. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6835) Archival Storage: Add a new API to set storage policy

2014-08-07 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-6835:
-

 Summary: Archival Storage: Add a new API to set storage policy
 Key: HDFS-6835
 URL: https://issues.apache.org/jira/browse/HDFS-6835
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Jing Zhao


The new data migration tool proposed HDFS-6801 will determine if the storage 
policy of files needs to be updated.  The tool needs a new API to set storage 
policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090053#comment-14090053
 ] 

Hadoop QA commented on HDFS-6772:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660464/HDFS-6772-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7585//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7585//console

This message is automatically generated.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.

2014-08-07 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6774:


Summary: Make FsDataset and DataStore support removing volumes.  (was: 
Remove volumes from DataStorage)

 Make FsDataset and DataStore support removing volumes.
 --

 Key: HDFS-6774
 URL: https://issues.apache.org/jira/browse/HDFS-6774
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu

 Managing volumes on DataNode includes decommissioning an active volume 
 without restarting DataNode. 
 This task adds support to remove volumes from {{DataStorage}} and 
 {{BlockPoolSliceStorage}} dynamically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6772:
--

Attachment: HDFS-6772-3.patch

Thanks, Arpit. Here is the updated patch. The reason I renamed from stale 
storage to content stale stage so that it is distinguished from stale 
datanode; given the stale definitions are different.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.

2014-08-07 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6774:


Attachment: HDFS-6774.000.patch

This patch enables {{FsDataset}} and {{DataStorage}} to remove data volumes 
dynamically. The {{replicaInfos}} that are on the deleted volume will also be 
removed from {{FsDataset#volumeMap}}. 

The race condition that removing a volume that is being written is not 
addressed in this patch. I will open a new JIRA for that case and potential 
other race conditions. 

 Make FsDataset and DataStore support removing volumes.
 --

 Key: HDFS-6774
 URL: https://issues.apache.org/jira/browse/HDFS-6774
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6774.000.patch


 Managing volumes on DataNode includes decommissioning an active volume 
 without restarting DataNode. 
 This task adds support to remove volumes from {{DataStorage}} and 
 {{BlockPoolSliceStorage}} dynamically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.

2014-08-07 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6774:


Status: Patch Available  (was: Open)

 Make FsDataset and DataStore support removing volumes.
 --

 Key: HDFS-6774
 URL: https://issues.apache.org/jira/browse/HDFS-6774
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6774.000.patch


 Managing volumes on DataNode includes decommissioning an active volume 
 without restarting DataNode. 
 This task adds support to remove volumes from {{DataStorage}} and 
 {{BlockPoolSliceStorage}} dynamically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090097#comment-14090097
 ] 

Arpit Agarwal commented on HDFS-6772:
-

Thanks Ming, I think we can clarify that difference with documentation. 
{{numContentStaleStorages}} sounded kind of awkward.

I am +1 for your latest patch, pending Jenkins.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions

2014-08-07 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6826:
-

Target Version/s: 2.6.0

 Plugin interface to enable delegation of HDFS authorization assertions
 --

 Key: HDFS-6826
 URL: https://issues.apache.org/jira/browse/HDFS-6826
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 When Hbase data, HiveMetaStore data or Search data is accessed via services 
 (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce 
 permissions on corresponding entities (databases, tables, views, columns, 
 search collections, documents). It is desirable, when the data is accessed 
 directly by users accessing the underlying data files (i.e. from a MapReduce 
 job), that the permission of the data files map to the permissions of the 
 corresponding data entity (i.e. table, column family or search collection).
 To enable this we need to have the necessary hooks in place in the NameNode 
 to delegate authorization to an external system that can map HDFS 
 files/directories to data entities and resolve their permissions based on the 
 data entities permissions.
 I’ll be posting a design proposal in the next few days.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6758) block writer should pass the expected block size to DataXceiverServer

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6758:


Issue Type: Improvement  (was: Bug)

 block writer should pass the expected block size to DataXceiverServer
 -

 Key: HDFS-6758
 URL: https://issues.apache.org/jira/browse/HDFS-6758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client
Affects Versions: 2.4.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6758.01.patch


 DataXceiver initializes the block size to the default block size for the 
 cluster. This size is later used by the FsDatasetImpl when applying 
 VolumeChoosingPolicy.
 {code}
 block.setNumBytes(dataXceiverServer.estimateBlockSize);
 {code}
 where
 {code}
   /**
* We need an estimate for block size to check if the disk partition has
* enough space. For now we set it to be the default block size set
* in the server side configuration, which is not ideal because the
* default block size should be a client-size configuration. 
* A better solution is to include in the header the estimated block size,
* i.e. either the actual block size or the default block size.
*/
   final long estimateBlockSize;
 {code}
 In most cases the writer can just pass the maximum expected block size to the 
 DN instead of having to use the cluster default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6825) Edit log corruption due to delayed block removal

2014-08-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6825:


Attachment: HDFS-6825.002.patch

Upload patch 002 to address test failure.


 Edit log corruption due to delayed block removal
 

 Key: HDFS-6825
 URL: https://issues.apache.org/jira/browse/HDFS-6825
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch


 Observed the following stack:
 {code}
 2014-08-04 23:49:44,133 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
 newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
 2014-08-04 23:49:44,133 WARN 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
 while updating disk space. 
 java.io.FileNotFoundException: Path not found: 
 /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 Found this is what happened:
 - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
 - client tried to append to this file, but the lease expired, so lease 
 recovery is started, thus the append failed
 - the file get deleted, however, there are still pending blocks of this file 
 not deleted
 - then commitBlockSynchronization() method is called (see stack above), an 
 InodeFile is created out of the pending block, not aware of that the file was 
 deleted already
 - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
 swallowed by commitOrCompleteLastBlock
 - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
 and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6830) BlockManager.addStorage fails when DN updates storage

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090253#comment-14090253
 ] 

Hadoop QA commented on HDFS-6830:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660489/HDFS-6830.03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7586//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7586//console

This message is automatically generated.

 BlockManager.addStorage fails when DN updates storage
 -

 Key: HDFS-6830
 URL: https://issues.apache.org/jira/browse/HDFS-6830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6830.01.patch, HDFS-6830.02.patch, 
 HDFS-6830.03.patch


 The call to {{removeStorageInfo}} is wrong because the block is still in the 
 DatanodeStorage's list of blocks and the callee does not expect it to be.
 {code}
   } else {
 // The block is on the DN but belongs to a different storage.
 // Update our state.
 removeStorage(getStorageInfo(idx));
 added = false;  // Just updating storage. Return false.
   }
 {code}
 It is a very unlikely code path to hit since storage updates usually occur 
 via incremental block reports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6828) Separate block replica dispatching from Balancer

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090274#comment-14090274
 ] 

Hadoop QA commented on HDFS-6828:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660503/h6828_20140808b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7587//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7587//console

This message is automatically generated.

 Separate block replica dispatching from Balancer
 

 Key: HDFS-6828
 URL: https://issues.apache.org/jira/browse/HDFS-6828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6828_20140808.patch, h6828_20140808b.patch


 The Balancer class implements two major features, (1) balancing logic for 
 selecting replicas in order to balance the cluster and (2) block replica 
 dispatching for moving the block replica around.  This JIRA is to separate 
 (2) from Balancer so that the code could be reused by other code such as the 
 new data migration tool proposed in HDFS-6801.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6722) Display readable last contact time for dead nodes on NN webUI

2014-08-07 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6722:
-

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~mingma] for the 
contribution.

 Display readable last contact time for dead nodes on NN webUI
 -

 Key: HDFS-6722
 URL: https://issues.apache.org/jira/browse/HDFS-6722
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-6722-2.patch, HDFS-6722-3.patch, HDFS-6722.patch


 For dead node info on NN webUI, admins want to know when the nodes became 
 dead, to troubleshoot missing block, etc. Currently the webUI displays the 
 last contact as the unit of seconds since the last contact. It will be 
 useful to display the info in Date format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6774) Make FsDataset and DataStore support removing volumes.

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090315#comment-14090315
 ] 

Hadoop QA commented on HDFS-6774:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660509/HDFS-6774.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.web.TestWebHDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7588//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7588//console

This message is automatically generated.

 Make FsDataset and DataStore support removing volumes.
 --

 Key: HDFS-6774
 URL: https://issues.apache.org/jira/browse/HDFS-6774
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.4.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6774.000.patch


 Managing volumes on DataNode includes decommissioning an active volume 
 without restarting DataNode. 
 This task adds support to remove volumes from {{DataStorage}} and 
 {{BlockPoolSliceStorage}} dynamically.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6772) Get DNs out of blockContentsStale==true state faster when NN restarts

2014-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090317#comment-14090317
 ] 

Hadoop QA commented on HDFS-6772:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660508/HDFS-6772-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7589//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7589//console

This message is automatically generated.

 Get DNs out of blockContentsStale==true state faster when NN restarts
 -

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6722) Display readable last contact time for dead nodes on NN webUI

2014-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090323#comment-14090323
 ] 

Hudson commented on HDFS-6722:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6035 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6035/])
HDFS-6722. Display readable last contact time for dead nodes on NN webUI. 
Contributed by Ming Ma. (wheat9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1616669)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java


 Display readable last contact time for dead nodes on NN webUI
 -

 Key: HDFS-6722
 URL: https://issues.apache.org/jira/browse/HDFS-6722
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-6722-2.patch, HDFS-6722-3.patch, HDFS-6722.patch


 For dead node info on NN webUI, admins want to know when the nodes became 
 dead, to troubleshoot missing block, etc. Currently the webUI displays the 
 last contact as the unit of seconds since the last contact. It will be 
 useful to display the info in Date format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6772) Get DN storages out of blockContentsStale state faster after NN restarts

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6772:


Summary: Get DN storages out of blockContentsStale state faster after NN 
restarts  (was: Get DNs out of blockContentsStale==true state faster when NN 
restarts)

 Get DN storages out of blockContentsStale state faster after NN restarts
 

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6772) Get DN storages out of blockContentsStale state faster after NN restarts

2014-08-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6772:


  Resolution: Fixed
   Fix Version/s: 2.6.0
  3.0.0
Target Version/s: 2.6.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks for the contribution [~mingma]!

 Get DN storages out of blockContentsStale state faster after NN restarts
 

 Key: HDFS-6772
 URL: https://issues.apache.org/jira/browse/HDFS-6772
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch


 Here is the non-HA scenario.
 1. Get HDFS into block-over-replicated situation.
 2. Restart the NN.
 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
 for a long time. That in turns make postponedMisreplicatedBlocks size big. 
 Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
 Given blockreport takes NN global lock, it has severe impact on NN 
 performance and make the cluster unstable.
 Why will DNs remain in blockContentsStale==true state for a long time?
 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
 before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
 when to send blockreport and heartbeat. In the case of NN restart, NN will 
 ask DN to register when NN gets the first heartbeat request; DN will then 
 register with NN; followed by blockreport RPC; the heartbeat RPC will come 
 after that.
 2. So right after the first blockreport, given heartbeatedSinceFailover 
 remains false, blockContentsStale will stay true.
 {noformat}
 DatanodeStorageInfo.java
   void receivedBlockReport() {
 if (heartbeatedSinceFailover) {
   blockContentsStale = false;
 }
 blockReportCount++;
   }
 {noformat}
 3. So the DN will remain in blockContentsStale==true until the next 
 blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
 some large value.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)