[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-27 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384569#comment-14384569
 ] 

Ming Ma commented on HDFS-7411:
---

[~andrew.wang], thanks for great work. Just to confirm, is HDFS-3087 still 
there? The unit test is there, but the {{DatanodeDescriptor}}'s 
{{checkBlockReportReceived}} isn't referenced anymore.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384714#comment-14384714
 ] 

Andrew Wang commented on HDFS-7411:
---

Nice find Ming, I think you're right. I'm not sure why that unit test still 
passes, but the right place to add this check is probably 
BlockManager#isNodeHealthyForDecommission. If you want to file a JIRA / patch, 
I'd be happy to review.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-27 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384817#comment-14384817
 ] 

Ming Ma commented on HDFS-7411:
---

Thanks, Andrew. I will file a jira for it.

Another issue is how it handles block under construction. It used to check 
{{curReplicas  minReplication}}, but it is changed  to {{numLive = 
blockManager.minReplication}}. It does change the data availability, but not 
sure if it matters much. Thought?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352834#comment-14352834
 ] 

Hudson commented on HDFS-7411:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #127 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/127/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352856#comment-14352856
 ] 

Hudson commented on HDFS-7411:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #861 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/861/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352862#comment-14352862
 ] 

Hudson commented on HDFS-7411:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2059 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2059/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353126#comment-14353126
 ] 

Hudson commented on HDFS-7411:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2077 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2077/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353070#comment-14353070
 ] 

Hudson commented on HDFS-7411:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #118 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/118/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353094#comment-14353094
 ] 

Hudson commented on HDFS-7411:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #127 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/127/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-09 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353270#comment-14353270
 ] 

Andrew Wang commented on HDFS-7411:
---

Glad to see this committed, thanks Chris for the final push, and everyone else 
for the reviews and comments.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352424#comment-14352424
 ] 

Hudson commented on HDFS-7411:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7281 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7281/])
HDFS-7411. Change decommission logic to throttle by blocks rather (cdouglas: 
rev 6ee0d32b98bc3aa5ed42859f1325d5a14fd1722a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DecommissionManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.7.0

 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-08 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352115#comment-14352115
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

If you believe that the new patch could keep the existing behavior, I am happy 
to remove my -1.  Unfortunately, I won't be able to review the patch.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351449#comment-14351449
 ] 

Chris Douglas commented on HDFS-7411:
-

Looked through the patch; it addresses the feedback. [~szetszwo], do you want 
to review the patch before commit?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-03-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345509#comment-14345509
 ] 

Andrew Wang commented on HDFS-7411:
---

Been about a week, [~szetszwo] anything? your veto is still blocking this from 
being fixed.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339691#comment-14339691
 ] 

Andrew Wang commented on HDFS-7411:
---

[~szetszwo] any comments? I'd like to move forward on this.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335738#comment-14335738
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700583/hdfs-7411.011.patch
  against trunk revision 9a37247.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9658//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9658//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316793#comment-14316793
 ] 

Chris Douglas commented on HDFS-7411:
-

bq. The -1 is not for the refactoring. It is for keeping the existing behavior.

Andrew, even though you prefer estimates or averages that approximate the 
existing behavior, halting when either of the limits are hit would move this 
forward.

Nicholas, would you be OK changing the default so this uses the new algorithm 
in clusters where the node limit is not explicitly configured (default value 
for nodes is {{Integer.MAX_VALUE}})? You're also OK enforcing the existing 
semantics in the new code?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-11 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316965#comment-14316965
 ] 

Andrew Wang commented on HDFS-7411:
---

bq. Andrew, even though you prefer estimates or averages that approximate the 
existing behavior, halting when either of the limits are hit would move this 
forward.

Saying to use the node limit is underspecified, since the new code only 
iterates over decomming nodes, whereas the old code iterates over all nodes. 
This constitutes a major behavior change, but Nicholas said that iterating over 
non-decomming nodes is a bug that should be fixed.

This is why I've been trying to elevate the discussion to what constitutes good 
or bad user experience. I have a hard time understanding why the iterating over 
just decomming nodes is an allowable change (even though it'll have a huge 
affect on pause times and decom rate), but the rest of my proposals are not 
okay because they constitute a behavior change.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317210#comment-14317210
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 ... would you be OK changing the default so this uses the new algorithm in 
 clusters where the node limit is not explicitly configured (default value for 
 nodes is Integer.MAX_VALUE)?

Agree.  This is the same as [my 
propose|https://issues.apache.org/jira/browse/HDFS-7411?focusedCommentId=14302030page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14302030]
 mentioned multiple times.

The default value could be removed from hdfs-default.xml.  Then then passing -1 
as default in the code.  Then, returning -1 means the conf is not set.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317206#comment-14317206
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 Again I would appreciate examples where user experience is negatively 
 impacted compared to before.

When a node have many small blocks, say 10m, then would setting 100k blocks per 
node be slower then the original node base algorithm?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-11 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317443#comment-14317443
 ] 

Andrew Wang commented on HDFS-7411:
---

bq. When a node have many small blocks, say 10m, then would setting 100k blocks 
per node be slower then the original node base algorithm?

Could you explain how you chose this value? Taking some sample numbers, a 
really big Hadoop cluster might be 4000 nodes and 300 million blocks. This is 
an average of 75k blocks per node. My experience with smaller, denser clusters 
is that they top out at 500k blocks per node, because there are issues with 
block report processing above that. 10m is 20x that already dense number.

Even so, it's not clear that it would be slower. The new algo does incremental 
scans, so it'll be a lot faster toward the end of decom. Also with the old 
code, doing full scans of 5 10m block DNs is going to be many seconds (maybe 
even 1min) of pause each time, which is also certainly not what an admin wants.

bq. Agree. This is the same as my propose mentioned multiple times.

I think it's a bit different. Chris's proposal is to also enforce a # of nodes 
limit in the new code, not to use the old code via a config toggle. The current 
patch already does detection of the old config, so it could be tweaked a bit to 
do this.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315506#comment-14315506
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 Isn't it better that we can quickly mark all these nodes as decommissioned, 
 rather than waiting minutes or hours? ...

Then, why don't you setting it to some larger values, say 1m or 10m?

 ... Really these statements can be made about any patch.

That's correct.  My comments are not subjective to you or your patches.  For 
any patch, if it wants to remove an existing public feature, it should first 
deprecate and then remove the feature in some later release.  It is not 
well-thought-out for anyone to say that one could fix any bug in the future so 
that we have nothing to worry.

 ... could you please address my proposals about how to translate the old 
 config into the new config? If you're okay removing the old code in a later 
 2.x release, ...

I cannot think of  a good way to do the translation.  Later on, when we have 
more experience the new approach, we may be able to make a better decision.

 ... The example you gave exhibits surprising behavior, but it's a pleasant 
 surprise. It's like finding presents under the tree on Christmas day.

It seems that most cluster admins do not like surprise.  A surprising event 
means they cannot understand the system well enough to predict the behavior.

I do agree that children like the surprise if they found presents under the 
Christmas tree.

 As this has already been reviewed by two people, I still feel like a patch 
 split to ease review is a strange request. -1 on an already +1'd patch 
 because it doesn't split a refactor would be quite novel.

The -1 is not for the refactoring.  It is for keeping the existing behavior.  I 
actually don't understand why the patch can get +1'ed without bringing up the 
incompatibility issue.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315571#comment-14315571
 ] 

Andrew Wang commented on HDFS-7411:
---

bq. Then, why don't you setting it to some larger values, say 1m or 10m?

Could you clarify what you mean by this? Faster decommission times seems like a 
good thing from an user's perspective. Performance improvements aren't an 
incompatible change.

bq. For any patch, if it wants to remove an existing public feature, it should 
first deprecate and then remove the feature in some later release.

We're not removing a feature, we're improving an existing one. As I've said 
above, limiting by # of nodes is flawed in very many ways. It's difficult for 
admins to use since the behavior is so variable, it's always surprising. As you 
indicate, this is not pleasing to cluster admins. The new patch actually makes 
decom much more predictable.

Again I would appreciate examples where user experience is negatively impacted 
compared to before.

bq. I cannot think of a good way to do the translation. Later on, when we have 
more experience the new approach, we may be able to make a better decision.

Can you comment on my proposals above specifically? I would appreciate examples 
where user experience is negatively impacted. This will help improve the patch. 
Else we're delaying for unspecified, unknown possible future concerns.

Also I said above, if you're okay with removing the old decom code in a later 
2.x release, we still need to figure out compatibility with the old config 
option now.

bq. The -1 is not for the refactoring. It is for keeping the existing behavior. 
I actually don't understand why the patch can get +1'ed without bringing up the 
incompatibility issue.

Compatibility was brought up quite a bit during review, even before your first 
comment. The patch and comment history demonstrates that. I think what we have 
right now satisfies compatibility concerns. The existing behavior is very hard 
for admins to use, so there is very little value to keeping it exactly as is. 
This patch improves upon that, even for users of the old configuration key, and 
I've proposed some other ways we could avoid admin surprises.

I'll also ask again, do you have any specific tests you'd like to see run to 
improve your confidence in the code quality?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315082#comment-14315082
 ] 

Andrew Wang commented on HDFS-7411:
---

bq. Is it the intended behavior?

Isn't it better that we can quickly mark all these nodes as decommissioned, 
rather than waiting minutes or hours? I don't see why an admin would prefer 
these nodes to stay decom-in-progress for hours. If anything, I hear admins 
complaining about decom being too slow, never the opposite.

bq. Thanks for signing it up. What if you are unavailable later on? What if 
there is a bug you don't know how to fix?

Well, Colin and Ming also reviewed this, so there are two more people who also 
can help maintain. I can't comment on a hypothetical bug you don't know how to 
fix. Really these statements can be made about any patch.

Do you have specific requirements around additional testing? Or comments about 
my proposals?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315141#comment-14315141
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 It does not seem the case to me since no one ever commented on the 
 incompatible change earlier.

I am actually quite surpurised that this was an incompatible change in many 
ways:
- From the JIRA summary, it starts with Refactor and improve which does not 
sound like incompatible.
- From the earlier comments such as ... but this scheme preserves backwards 
compatibility. and ... The idea here was to be compatible with the old config 
option,   It does seem that the patch is compatible.
- Before I said that [dfs.namenode.decommission.nodes.per.interval should be 
deprecated 
first|https://issues.apache.org/jira/browse/HDFS-7411?focusedCommentId=14294224page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14294224].
  No other contributors/reviewers mentioned that the patch removed a public 
conf so that it was incompatible.
- The JIRA was not marked as an Incompatible change.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315086#comment-14315086
 ] 

Andrew Wang commented on HDFS-7411:
---

I'll also note that some of my earlier proposals addresses a low block-count 
per node, if we agree this is a concern.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315112#comment-14315112
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 ... Heh; I don't think you intended to float this as a precondition for 
 commit.

No. I just want to point out that keeping the existing code is important.

 ... the patch has received a fair amount of review. ...

It does not seem the case to me since no one ever commented on the incompatible 
change earlier.

The block base approach does sound a good idea to me.  I never object adding 
it.  However, as we generally cannot guarantee any new code to be perfect and 
the new approach can solve the old problems, we should keep the existing code 
for the existing conf.  The existing code can be removed after one or two 
releases.

Also, I insist to have a simple refactoring patch before adding the new code 
because the refactoring patch is going to be very easily reviewed.  The patch 
with the new code will be smaller and more easier to understand.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315051#comment-14315051
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 ... with pause time being the most important one. ...

It is not necessary the most important in all the cases.  Consider a large 
cluster with many datanode, say 4000, but few data so that each datanode only 
has few number of blocks, say  100.  Then the admin decide to reduce the size 
of the cluster dramatically, say 2000.  With setting 100k blocks per node, it 
could decommission  1000 nodes per iteration.  Is it the intended behavior?

 As usual, I'll sign myself up to fix any issues that surface, so there's no 
 worry about ongoing maintenance.

Thanks for signing it up.  What if you are unavailable later on?  What if there 
is a bug you don't know how to fix?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315092#comment-14315092
 ] 

Chris Douglas commented on HDFS-7411:
-

bq. What if there is a bug you don't know how to fix?
Heh; I don't think you intended to float this as a precondition for commit.

[~szetszwo], the patch has received a fair amount of review. Are there cases 
that should be tested that would increase your confidence in its correctness 
and performance? Per your example, are there additional parameters that should 
be added to the algorithm?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315197#comment-14315197
 ] 

Andrew Wang commented on HDFS-7411:
---

Nicholas, could you please address my proposals about how to translate the old 
config into the new config? If you're okay removing the old code in a later 2.x 
release, then we still need to agree on compatibility now. As I said above, I 
struggle to see downsides from a user point of view. The example you gave 
exhibits surprising behavior, but it's a pleasant surprise. It's like finding 
presents under the tree on Christmas day.

Same for my question about testing. If the concern is code quality, let's think 
up some more testing.

As this has already been reviewed by two people, I still feel like a patch 
split to ease review is a strange request. -1 on an already +1'd patch because 
it doesn't split a refactor would be quite novel.

We're all buds, so I can do a split as a show of good faith, but I'd like 
agreement on the above two questions first.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315201#comment-14315201
 ] 

Chris Douglas commented on HDFS-7411:
-

If the new code were to implement a switch for the current semantics (but not 
the existing code), would that satisfy your reservations? A workaround for 
unforeseen flaws that falls back to the node-based algorithm is prudent, but 
the multi-stage refactor adds more generality to the code that is probably 
required. Per your comment, would you want the algorithm to stop when _either_ 
the max #blocks or max #nodes have been checked? {{numNodesChecked}} is kept 
around for metrics anyway; it looks like it could be straightforward to add 
this (Andrew, please correct this if it's mistaken)

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315220#comment-14315220
 ] 

Andrew Wang commented on HDFS-7411:
---

Yea, it'd be easy to do a node-based limiter, though I'd really prefer some of 
the other schemes I offered up. They offer very similar properties in terms of 
pause times, and make more sense when considering the new incremental scan 
scheme.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-09 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313358#comment-14313358
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 ..., HDFS-7712 and HDFS-7734 also do not relate to this JIRA. HDFS-7706 is 
 what I split out from this JIRA. ...

If HDFS-7706 is not split out (which is your original plan), the work of 
HDFS-7712 may possibly be added here.  So it seems related.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-09 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313355#comment-14313355
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 Since the decom manager iterates over the whole datanode list, both live and 
 decommissioning nodes count towards the limit. Thus, the actual number of 
 decomming nodes processed varies between 0 and the limit.

I agree that live node should not be counted.  It is a bug in the node based 
decommission logic and we probably should fix the bug.

A problem for translating node-based to block-based decommission is that there 
is no good choice of number of nodes per block.  It may result in surprising 
behavior for clusters extremely large or extremely small.  This is a reason 
that we want to keep the old code.

A second reason is that the old code is proven to be working.  If we remove it 
now and then find a serious bug in this patch after a release, then we don't 
have anything to fallback.  That's why the standard procedure to remove an 
existing feature is to deprecate it first and the remove it later.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-09 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313386#comment-14313386
 ] 

Andrew Wang commented on HDFS-7411:
---

Re: node-to-block translation, do any of my proposed schemes meet your 
criteria? I think I laid out concerns from the perspective of the admin 
correctly, with pause time being the most important one. I'll also note by that 
metric, fixing the bug of counting live nodes will quite severely affect pause 
time.

Re: proven to be working, is there additional testing that you will find 
satisfactory? This change passes the existing decommission unit tests, as well 
as adding a fair bit more. I think it's quite rare to preserve old code that's 
functionally the same behind a config. I actually can't think of an example 
from the last two years.

As usual, I'll sign myself up to fix any issues that surface, so there's no 
worry about ongoing maintenance.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-09 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312877#comment-14312877
 ] 

Andrew Wang commented on HDFS-7411:
---

[~szetszwo] have you gotten a chance to read the above? Any other comments from 
other reviewers? I'd like to move on this if possible.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308115#comment-14308115
 ] 

Andrew Wang commented on HDFS-7411:
---

Nicholas, HDFS-7712 and HDFS-7734 also do not relate to this JIRA. HDFS-7706 is 
what I split out from this JIRA. The other two are a separate effort to convert 
more of HDFS to slf4j.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308105#comment-14308105
 ] 

Andrew Wang commented on HDFS-7411:
---

I had an offline request to summarize some of the above.

Nicholas's compatibility concern regards the rate limiting of decommissioning. 
Currently, this is expressed as a number of nodes to process per decom manager 
wakeup. There are a number of flaws with this scheme:

* Since the decom manager iterates over the whole datanode list, both live and 
decommissioning nodes count towards the limit. Thus, the actual number of 
decomming nodes processed varies between 0 and the limit.
* Since datanodes have different number of blocks, the amount of actual work 
can vary based on this as well.

This means:
* This config parameter only very loosely corresponds to decom rate and decom 
pause times, which are the two things that admins care about.
* Trying to tune decom behavior with this parameter is thus somewhat futile.
* In the grand scope of HDFS, this is also not a common parameter to be tweaked.

Because this, we felt it was okay to change the interpretation of this config 
option. I view the old behavior more as a bug than something that is being 
depended upon by a user.

Translating this number of nodes limit instead into a number of blocks limit 
(as done in the current patch) makes the config far more predictable and thus 
usable. Since the new code also supports incremental scans (which is what makes 
it faster), specifying the limit in a number of nodes limit doesn't make much 
sense.

The only potential surprise I see for cluster operators is if the translation 
of the limit from {{# nodes}} to {{# blocks}} is too liberal. This would result 
in longer maximum pause times than before. We thought 100k nodes per block was 
a conservative estimate, but this could be further reduced.

One avenue I do not want to pursue is keeping the old code around, as Nicholas 
has proposed. This increases our maintenance burden, and means many people will 
keep running into the same issues surrounding decom.

If Nicholas still does not agree with the above rationale, I see the following 
potential options for improvement:

* Be even more conservative with translation factor, e.g. assume only 50k 
blocks per node
* Factor in the number of nodes and/or avg blocks per node to the translation. 
This will better approximate the old average pause times.
* Make the new decom manager also support a {{# nodes}} limit. This isn't great 
since scans are incremental now, but it means we'll be doing strictly less work 
per pause than before.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-04 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306261#comment-14306261
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

I do not intention to block the decommission improvement.  However, it is 
really a bad idea to arbitrarily change the behavior of a public conf when 
keeping the existing behavior is easy, and mixing code refactoring with 
improvement in a big patch.

[~andrew.wang], I am glad that you split the slf4j change from here to 
HDFS-7706 and filed another minor JIRA HDFS-7712 for the further work.  If the 
work is included here, the blocker JIRA HDFS-7734 will be broken by the 
unnecessarily complicated patch here.

In order to move faster, how about I volunteer doing the refactoring?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306167#comment-14306167
 ] 

Kihwal Lee commented on HDFS-7411:
--

HDFS-7735 is also being worked on.  If we can convince our selves that the 
performance benefit of this change is great, it will be worth while to spend 
time to find a way to introduce this without incompatibility.  The performance 
here can be viewed in two angles: 1) impact on name node and 2) decommissioning 
speed.  I will also try to review the patch.  I think everyone agrees that we 
need to improve decommissioning.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-04 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306193#comment-14306193
 ] 

Andrew Wang commented on HDFS-7411:
---

I would definitely welcome some other opinions on the perceived incompatibility 
here. I think Colin and myself have already made our thoughts on the matter 
pretty clear above. Thanks in advance Kihwal.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301870#comment-14301870
 ] 

Colin Patrick McCabe commented on HDFS-7411:


Thanks for splitting off the logging changes, Andrew.

It's reasonable to keep backwards compatibility by supporting 
{{dfs.namenode.decommission.nodes.per.interval}}.  I can see that you have done 
this in the latest patch.  Maybe eventually in Hadoop 3.0 we can drop support 
for this parameter.

+1.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302015#comment-14302015
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

{code}
+// Assume 100k blocks per node.
+blocksPerInterval = 100 * 1000 * numNodes;
{code}
How to come up with such assumption?  It seems invalid for some clusters.  
Also, nodes in a cluster may have different numbers of blocks.  Simply assuming 
all datanodes having the same number of blocks does not seem correct.

Why not keeping the existing code?  It is a simple easy way to support backward 
compatibility.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301881#comment-14301881
 ] 

Andrew Wang commented on HDFS-7411:
---

Thanks for reviewing Colin. I'll wait another day before committing, in case 
there are further review comments. I think we can take most things to a 
follow-on, would be great to get this in to unblock further decom improvements.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302030#comment-14302030
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

I also like to get this work here committed faster.  So, I suggest
# Create another JIRA for a pure code refactoring.  Move all the existing code 
to DecommissionManager.  No logic change.
# Change the patch here to add the new decommission code but NOT removing the 
existing code so that it uses the old code if the old conf is set and the new 
code if new conf is set.

Sound good?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302322#comment-14302322
 ] 

Arpit Agarwal commented on HDFS-7411:
-

bq.  Ming Ma, Arpit Agarwal, and myself have reviewed this.
I have not reviewed the patch. I just commented on a change to a single 
function.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302319#comment-14302319
 ] 

Colin Patrick McCabe commented on HDFS-7411:


I don't see any benefit to splitting the patch further.  The old logic is 
seriously flawed... it's not effective at rate limiting and often goes way too 
fast or too slow, because it is based on number of nodes rather than number of 
blocks.

I think it's great that [~andrew.wang] took on this task, which has been a 
maintenance problem for us for a while.  This is a great example of someone who 
really cares about the project making things better by working on something 
which is boring (it will never make it into a list of new features or 
exciting research talks) but essential to our users.

There is no benefit to keeping around multiple broken implementations of things 
to do the same job.  Hadoop has enough dead and obsolete code as is... more 
than enough.  Things like the {{RemoteBlockReader}} / {{RemoteBlockReader2}} 
split increase our maintenance burden and confuse users and potential 
contributors.  I only allowed the {{BlockReaderLocalLegacy}} / 
{{BlockReaderLocal}} split because we didn't have platform support for file 
descriptor passing on Windows.  But since there are no platform support issues 
here, there is no reason to increase our maintenance burden.

If we are concerned about stability, we can let this soak in trunk for a while.

It has been through three months of review.  [~mingma], [~arpitagarwal], and 
myself have reviewed this.   It's ready to go in, and I think it should.  +1.  
Let's commit this today if there are no other comments about the patch.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302510#comment-14302510
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 ... the old limiting scheme is seriously flawed. ...
 ...  The old logic is seriously flawed...

Sure, you see it this way but some users may feel that the old code works just 
fine.  They may not have time to deal with new behavior.  We cannot force them 
to do so.

 There is no benefit to keeping around multiple broken implementations of 
 things to do the same job. ...

We are not keeping multiple implementations.  The old implementation will be 
removed in the future.

 ...  It's ready to go in, and I think it should. +1. Let's commit this today 
 if there are no other comments about the patch.

Let me clarify my -1.  The patch changes an existing conf property to a 
different behavior.  Instead, we should keep the existing behavior, deprecate 
the conf property first and then remove it later.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302092#comment-14302092
 ] 

Andrew Wang commented on HDFS-7411:
---

As discussed above, the old limiting scheme is seriously flawed. The amount of 
time spent is highly variable, since it's # nodes rather than # blocks, and the 
size of each node is variable. It also counts both decommissioning and not 
decommissioning nodes towards the limit.

That nodes can vary in # of blocks and is really an argument for *not* using # 
nodes as a limit. # of blocks is superior. The 100k was chosen as a 
conservative number that will not lead to overly long wake-up times, which is 
the point of this limit. In fact, with this patch we should see far more 
predictable pause times for decommission work even with the old config. In 
addition, it'll also result in an improvement in overall decommission speed 
because of the incremental scan logic.

Because of this, I do not see any advantage to keeping this old code around. 
The old code is worse in terms of predictable pause times and overall 
decommissioning speed. It also has other flaws that are corrected by this 
patch. The new code is compatible with the old configuration. It also requires 
a lot of work to split the refactoring.

I still plan to commit tomorrow.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302585#comment-14302585
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 I think it's pretty common for us to change the behavior of the system when 
 the behavior change is a strict improvement. Keeping around inferior behavior 
 just for the purpose of consistency seems rather pointless.

First of all, we never have showed that the new behavior is strictly better.  
It is a hypothesis.  No?

 For example, it used to be the case that fsimage transfers ...

The fsimage transfer change is an internal protocol change.  The http interface 
is not a public API.  However, the conf property discussed here is public.

 Similarly, when we find ways that CPU performance or memory usage in the NN 
 can be improved, ...

The change here is not like that.  It changes the scheme from node based to 
block based.  It is not making the node based decommission faster.

 As Andrew Wang has already described, the new behavior should be both more 
 performant and more predictable. ...

As mentioned previously, it is a hypothesis. That why Andrew described as 
should be.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302316#comment-14302316
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 Because of this, I do not see any advantage to keeping this old code around. 
 ...

Users are familiar with the existing behavior so that we have to keep the old 
code, deprecate them first and then remove them later.  It is a standard 
procedure for changing the behavior of a feature.  No?

 I still plan to commit tomorrow.

-1 on the patch.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-02 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302513#comment-14302513
 ] 

Aaron T. Myers commented on HDFS-7411:
--

bq. Users are familiar with the existing behavior so that we have to keep the 
old code, deprecate them first and then remove them later. It is a standard 
procedure for changing the behavior of a feature. No?

I think it's pretty common for us to change the behavior of the system when the 
behavior change is a strict improvement. Keeping around inferior behavior just 
for the purpose of consistency seems rather pointless.

For example, it used to be the case that fsimage transfers from the secondary 
or standby NN happened with a GET request from the standby to the active, which 
then triggered another GET request back from the active to the standby. When we 
did the work to make the transfer work with just a single POST request from the 
standby to the active, we didn't bother keeping around the old GET/GET behavior 
just because people were used to it.

Similarly, when we find ways that CPU performance or memory usage in the NN can 
be improved, we don't keep around the old ways of doing things just because 
operators might be used to the NN being slow or using a lot of memory.

bq. -1 on the patch.

As [~andrew.wang] has already described, the new behavior should be both more 
performant and more predictable. I don't know see why we'd want to keep around 
decommissioning behavior that can be both erratic and slower.

Andrew already quite graciously amended the patch to preserve the backward 
compatibility of {{dfs.namenode.decommission.nodes.per.interval}} per your 
request, [~szetszwo]. I would hope that would have satisfied your feedback. 
Please consider withdrawing your -1.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299353#comment-14299353
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12695598/hdfs-7411.010.patch
  against trunk revision 951b360.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9383//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9383//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9383//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299300#comment-14299300
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 This statement is false. Configuration compatibility was the core of the 
 above discussion. ...

Sure, there is a discussion of how to be compatible with the old conf.  
However, it never mentions that the decision is to have an incompatible change.

Anyway, thanks for the update.  Will review the patch.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299158#comment-14299158
 ] 

Andrew Wang commented on HDFS-7411:
---

I split the logging changes off to HDFS-7706 which I just committed. New rev 
posted.

bq. It seems the discussion above did not consider the incompatibility. I guess 
the unnecessarily complicated and large patch did hide the important details. 
We need to revisit it.

This statement is false. Configuration compatibility was the core of the above 
discussion. In fact, my 003 rev of this patch tried to keep compatibility with 
the old key, and based on the discussion we decided to change that.

This newest rev does bring fallback support for the old key though, which 
satisfies your comment.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299413#comment-14299413
 ] 

Andrew Wang commented on HDFS-7411:
---

RAT complains about a psd file? seems spurious.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-28 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295795#comment-14295795
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 Those methods you mentioned are pretty small, and are far from the bulk of 
 the patch. ...

The GenericHdfsTestUtils change also adds a significant amount of code to the 
patch.  The code refactor does seem occupying half of the patch.

 ... Two other reviewers have also made it through this successfully patch, so 
 I don't think it's so bad to review.

I did not say that it is impossible to review the patch.  It just unnecessarily 
complicated so that reviewers need to spend extra time to review the patch.

 Regarding the removed config property, this is something discussed above. ...

It seems the discussion above did not consider the incompatibility.  I guess 
the unnecessarily complicated and large patch did hide the important details.  
We need to revisit it.

 ... I don't see a way of deprecating this gracefully, since the units of the 
 old and new config properties are incompatible. ...

One way is to keep the original code.  Use the old code if the old conf is set 
and the new code if new conf is set.  Isn't it simple enough?

Since this involve an incompatible change, please move the refactoring code out 
from the improvement.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294037#comment-14294037
 ] 

Andrew Wang commented on HDFS-7411:
---

Looks like a flake, balancer.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-27 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294224#comment-14294224
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

dfs.namenode.decommission.nodes.per.interval is a public conf property so that 
it cannot simply be replaced.  We should deprecate first.


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-27 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294241#comment-14294241
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

- Why adding a new class GenericHdfsTestUtils?  It only has a very specific 
method setNameNodeLogLevel.  Why not putting it in DFSTestUtil?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-27 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294185#comment-14294185
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

 If you look at version 2 of the patch, you can see the initial refactor, 
 which consisted of moving some methods from BlockManager to DecomManager. I 
 didn't bother splitting this though since it ended up not being very 
 interesting. DecomManager is also basically all new code, so the old code 
 would be moved and then subsequently deleted if we split it.

It seems not true that DecommissionManager is all new code.  Quite a few 
methods such as logBlockReplicationInfo(..), startDecommission(..) and 
stopDecommission(..) are moved from the existing code.  This is what I mean, by 
combining refactoring and improvement, it makes reviewing the patch harder and 
discourages collaboration.  The patch here is unnecessarily big.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294345#comment-14294345
 ] 

Andrew Wang commented on HDFS-7411:
---

Hi Nicholas,

Those methods you mentioned are pretty small, and are far from the bulk of the 
patch. Would really prefer not to patch split at this point, since it's a lot 
of work. Two other reviewers have also made it through this successfully patch, 
so I don't think it's so bad to review.

Regarding the removed config property, this is something discussed above. I 
don't see a way of deprecating this gracefully, since the units of the old and 
new config properties are incompatible. We needed to change the units because 
limiting by # of nodes is quite erratic. Since it is quite erratic, I don't 
think there's much harm in just removing the old config. It doesn't do much 
right now, and with the patch it'll do nothing.

I can move the test method over as you suggested, but would prefer to get 
clarity on the above before posting another patch rev. Thanks.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292301#comment-14292301
 ] 

Andrew Wang commented on HDFS-7411:
---

This is the same testIncludeByRegistrationName failure in HDFS-7527. I think my 
test improvements make it fail reliably instead of flakily (which is correct, 
since it's broken), but maybe this should be @Ignore'd in this patch.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292788#comment-14292788
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694630/hdfs-7411.009.patch
  against trunk revision 1f2b695.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.balancer.TestBalancer

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9331//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9331//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290173#comment-14290173
 ] 

Andrew Wang commented on HDFS-7411:
---

If you look at version 2 of the patch, you can see the initial refactor, which 
consisted of moving some methods from BlockManager to DecomManager. I didn't 
bother splitting this though since it ended up not being very interesting. 
DecomManager is also basically all new code, so the old code would be moved and 
then subsequently deleted if we split it.

I think the easiest way of reviewing it is just to read through DecomManager, 
which really isn't that big of a class. It's quite well commented and has lots 
of logging, which is part of why this change as a whole appears large.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290146#comment-14290146
 ] 

Andrew Wang commented on HDFS-7411:
---

Thanks again for reviewing Colin, fixed with the following notes:

bq. Grammar: is already decomissioning

decommissioning in progress is a state for a node, so I think this is 
accurate, although ugly, language.

bq. What's the rationale for initializing the DecomissionManager configuration 
in activate rather than in the constructor? It seems like if we initialized the 
conf stuff in the constructor we could make more of it final?

I wasn't sure about this either, but it seems like the NN really likes for 
everything to be init'd with the Configuration passed when starting common 
services.

For this particular function, I went ahead and made the config variables final 
since they're just scoped to that function. Since we make a new Monitor each 
time, those members are final there too.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290152#comment-14290152
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

Could we separate code refactoring and improvement into two JIRAs?  The 
refactoring probably with a big patch is easy to review.  The improvement patch 
will be much smaller.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290348#comment-14290348
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694272/hdfs-7411.008.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDecommission

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9320//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9320//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288272#comment-14288272
 ] 

Colin Patrick McCabe commented on HDFS-7411:


{code}
  public static final String  DFS_NAMENODE_DECOMMISSION_BLOCKS_PER_INTERVAL_KEY 
= dfs.namenode.decommission.blocks.per.node;
  public static final int 
DFS_NAMENODE_DECOMMISSION_BLOCKS_PER_INTERVAL_DEFAULT = 50;
{code}
Key name doesn't match key string

TestDNFencing: can we avoid moving around the import static 
org.junit.Assert.assertEquals;?  This kind of churn makes backports annoying.

TestPendingInvalidateBlock:  do we need this import?
{code}
import org.apache.log4j.Logger;
{code}

{{DecomissionManager#activate}}: can we have some debug logs here spitting out 
the configuration parameter values?

{code}
  LOG.trace(startDecommission: Node {} is already decommission in 
  + progress, nothing to do., node);
{code}

Grammar: is already decomissioning

{{getInsufficientlyReplicated}}: this function's name suggests a getter, but it 
seems to also schedule replication work.  How about calling this 
{{handleInsufficientlyReplicated}}?

{{kickMonitor}}: this kicks the monitor, but also waits for the it to complete. 
 That's kind of different from some of our other kick functions. How about 
calling this {{runMonitor}} or something like that?

What's the rationale for initializing the {{DecomissionManager}} configuration 
in {{activate}} rather than in the constructor?  It seems like if we 
initialized the conf stuff in the constructor we could make more of it 
{{final}}?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-20 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284438#comment-14284438
 ] 

Andrew Wang commented on HDFS-7411:
---

I think these test failures are unrelated. HDFS-7527 is tracking the 
TestDecommission#testIncludeByRegistrationName failure. I ran 
TestRollingUpgrade locally a few times and it worked.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285117#comment-14285117
 ] 

Colin Patrick McCabe commented on HDFS-7411:


Sorry, there's been a bunch of big patches committed lately and I've been stuck 
rebasing some other things on top of them.  Will take a look at this tomorrow 
hopefully.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279802#comment-14279802
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692660/hdfs-7411.007.patch
  against trunk revision c4ccbe6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestRollingUpgrade
  org.apache.hadoop.hdfs.TestDecommission

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9235//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9235//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277677#comment-14277677
 ] 

Andrew Wang commented on HDFS-7411:
---

Hi Colin, thanks for reviewing. I'll rework the patch after we settle on the 
details, few replies:

bq. Shouldn't we be using that here, rather than creating our own list in 
decomNodeBlocks?

This is actually a feature, not a bug :) Having our own datastructure lets us 
speed up decom by only checking blocks that are still insufficiently 
replicated. We prune out the sufficient ones each iteration. The memory 
overhead here should be pretty small since it's just an 8B reference per block, 
so with 1 million blocks this will be 8MB for a single node, or maybe 160MB for 
a full rack. Nodes are typically smaller than this this, so these are 
conservative estimates, and large decoms aren't that common.

The one thing I could see as a nice improvement is that we could skip the final 
full scan at the end of decom if we immediately propagate block map changes to 
decomNodeBlocks, but that seems like more trouble than it's worth.

bq. have a configuration key like dfs.namenode.decommission.blocks.per.minute 
that expresses directly what we want.

On thinking about it I agree that just using a new config option is fine, but 
I'd prefer to define the DecomManager in terms of both an interval and an 
amount of work, rather than a rate. This is more powerful, and more in-line 
with the existing config. Are you okay with a new {{blocks.per.interval}} 
config?

bq. dfs.namenode.decommission.max.concurrent.tracked.nodes

I agree that it can lead to hangs. At a minimum, I'll add a 0 means no limit 
config, and maybe we can set that by default. I think that NNs should really 
have enough heap headroom to handle the 10-100 of MBs of memory for this, it's 
peanuts compared to the 10s of GBs that are quite typical.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277694#comment-14277694
 ] 

Colin Patrick McCabe commented on HDFS-7411:


bq. This is actually a feature, not a bug  Having our own datastructure lets us 
speed up decom by only checking blocks that are still insufficiently 
replicated. We prune out the sufficient ones each iteration. The memory 
overhead here should be pretty small since it's just an 8B reference per block, 
so with 1 million blocks this will be 8MB for a single node, or maybe 160MB for 
a full rack. Nodes are typically smaller than this this, so these are 
conservative estimates, and large decoms aren't that common.

That's a fair point.  It's too bad we can't use the existing list for this, but 
it's already being re-ordered in the block report processing code, for a 
different purpose.  I agree that it's fine as-is.

bq. On thinking about it I agree that just using a new config option is fine, 
but I'd prefer to define the DecomManager in terms of both an interval and an 
amount of work, rather than a rate. This is more powerful, and more in-line 
with the existing config. Are you okay with a new blocks.per.interval config?

Why is blocks.per.interval more powerful than blocks per minute?  It just 
seems annoying to have to do the mental math to figure out what to configure to 
get a certain blocks per minute going.  Also, the fact that intervals even 
exist is an implementation detail... you could easily imagine an 
event-triggered version that didn't do periodic polling.  I guess I don't feel 
strongly about this, but I'd like to understand the rationale more.

bq. I agree that it can lead to hangs. At a minimum, I'll add a 0 means no 
limit config, and maybe we can set that by default. I think that NNs should 
really have enough heap headroom to handle the 10-100 of MBs of memory for 
this, it's peanuts compared to the 10s of GBs that are quite typical.

Makes sense.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277593#comment-14277593
 ] 

Colin Patrick McCabe commented on HDFS-7411:


DecomissionManager:

Let's document the locking here.  I believe all of these functions are designed 
to be run with the FSNamesystem write lock held, correct?

A question about the data structures here.  We already have a way of iterating 
through all the blocks on a given {{DataNode}} via the implicit linked lists in 
the {{BlockInfo}} objects.  Shouldn't we be using that here, rather than 
creating our own list in {{decomNodeBlocks}}?  This could be a lot of memory 
saved here, since datanodes might have anywhere between 200k and a million 
blocks in the next few years.  (I am fine with doing this in a follow-on, of 
course.)

{{DecomissionManager#startDecommission}}: let's have a debug log message here 
for the case where the node is already decommissioned... it might help with 
debugging.  Similarly in {{stopDecomission}}... if we're not doing anything, 
let's log why we're not doing anything at debug or trace level.

Config Keys:

We talked about this a bit earlier but didn't really come to any consensus.  I 
think we should just get rid of 
{{dfs.namenode.decommission.nodes.per.interval}} and 
{{dfs.namenode.decommission.blocks.per.node}}, and just have a configuration 
key like {{dfs.namenode.decommission.blocks.per.minute}} that expresses 
directly what we want.

If we set a reasonable default for 
{{dfs.namenode.decommission.blocks.per.minute}}, the impact on users will be 
minimal.  The old rate-limiting process was broken anyway... that's a big part 
of what this patch is designed to fix, as I understand.  So we shouldn't need 
to lug around these old config keys that don't really express what we're trying 
to configure.  Let's just add a new configuration key that is easy to 
understand, and maintain in the future.  Decom is a manual process anywhere 
where admins get involved

{{dfs.namenode.decommission.max.concurrent.tracked.nodes}}: I have mixed 
feelings about this configuration key.  It seems like the reason you want to 
add it is to limit NN memory consumption (but you can do that by not 
duplicating data structures-- see above).  However, it may have some value for 
users who would rather finish decomissioning 100 nodes in an hour, than have 
1000 nodes 10% decomissioned in that time.  So I guess it is a good addition, 
maybe?  The only problem is that if some of those first 100 nodes get stuck 
because there is an open-for-write file or something, then the decom process 
will start to slow down and perhaps eventually hang.

On a related note, I feel like we should have a follow-up change to make the 
decom parameters reconfigurable via the configuration reloading interface we 
added recently.  I will file a follow-on JIRA for that.

{code}
if (LOG.isDebugEnabled()) {
  StringBuilder b = new StringBuilder(Node {} );
  if (isHealthy) {
b.append(is );
  } else {
b.append(isn't );
  }
  b.append(and still needs to replicate {} more blocks,  +
  decommissioning is still in progress.);
{code}
Missing healthy  in the printout.

Can we log this at info level like every half hour or something, so that people 
can see issues with nodes getting stuck?  As it is, it seems like they'll get 
no output unless they monkey with log4j manually.

{code}
   * Note also that the reference to the list of under-replicated blocks 
   * will be null on initial addx
{code}
Spelling?

{code}
  final IteratorMap.EntryDatanodeDescriptor, AbstractListBlockInfo
  
  it =
  new CyclicIterationDatanodeDescriptor, AbstractListBlockInfo( 
  
  decomNodeBlocks, iterkey).iterator(); 
  
{code}
Well, this is creative, but I think I'd rather have the standard indentation :)

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277909#comment-14277909
 ] 

Andrew Wang commented on HDFS-7411:
---

bq. Why is blocks.per.interval more powerful than blocks per minute?

I don't think the end goal is to achieve a certain rate per minute. Rather, 
it's how long the pause is when the DecomManager wakes up, and how often it 
wakes up. This tunes latency vs. throughput; short pause is better latency, 
long run is better throughput. This can't be expressed by just 
blocks.per.minute, since a high blocks.per.minute might mean to wake up very 
often to do a little work, or very occasionally to do a lot of work.

It also fixes the timescale to per minute. This, naively, implies that it'd 
be okay to wake up once a minute to do a minute's worth of work. But maybe the 
user wants to see something happen within a few seconds, rather than a minute. 
Without being able to tune the interval, this flexibility is gone.

The event triggered idea is also something I considered, but even then we'd 
still need to do the full scan at the start of decom, which means some kind of 
limiting scheme.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278114#comment-14278114
 ] 

Colin Patrick McCabe commented on HDFS-7411:


You can compute blocks per interval from blocks per second and interval length, 
both of which are specified.  And in any case, we may eventually need to start 
releasing the lock occasionally during decom intervals as the number of 
blocks continues to double year-over-year.  The fact that we hold the lock 
throughout the whole interval is an implementation detail.  Anyway, I don't 
feel that strongly about this so if you want to keep it as a per-interval 
config, that's probably ok.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-12 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274361#comment-14274361
 ] 

Andrew Wang commented on HDFS-7411:
---

I just kicked Jenkins manually, the mysteries of the buildbot still elude me.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274611#comment-14274611
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690628/hdfs-7411.006.patch
  against trunk revision f3507fa.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9192//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9192//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268306#comment-14268306
 ] 

Andrew Wang commented on HDFS-7411:
---

Thanks for the reviews Colin and Ming, new patch up.

* Rebased on trunk
* Updated for new ChunkedArrayList with a working size function
* Updated the logging calls to use new log level helper methods, consolidated 
some duplicated logging
* Did the rename and spacing fixes that Colin recommended. Log level change was 
necessary since slf4j doesn't have fatal.
* Ming, agree with everything you pointed out. The blocks limit enforcement is 
intentionally inexact; I played with iterating based on both DN+block rather 
than just DN, but it seemed more complex for a small gain.

As to the limit configuration, I didn't touch it for now until we agree on a 
solution. The idea here was to be compatible with the old config option, yet 
also provide a way of migrating to the new one. Ming's proposal seems 
reasonable, but the override makes configuration more complex. I feel that 
computing the limit based on runtime information could also lead to surprises.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-19 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254213#comment-14254213
 ] 

Ming Ma commented on HDFS-7411:
---

Couple more comments:

* dfs.namenode.decommission.blocks.per.node together with 
dfs.namenode.decommission.nodes.per.interval can control how long 
DecommissionManager will hold FSN write lock. Should we just define 
dfs.namenode.decommission.blocks.per.interval instead? If 
dfs.namenode.decommission.blocks.per.interval is defined in the configuration, 
use it. If only dfs.namenode.decommission.nodes.per.interval is defined, use 
the run time average block count per node to estimate.

* It seems exceededNumBlocksPerCheck is called only when DecommissionManager 
moves to to check the next DN. If a DN has lots of blocks, the check won't stop 
earlier.

* This patch has make dfsadmin -refreshNodes asynchronously for decommission; 
e.g. startDecommission no longer calls checkDecommission which takes FSN write 
lock. But it has to wait for DecommissionManager's next check to kick off the 
replication process. https://issues.apache.org/jira/browse/HDFS-5757 and 
https://issues.apache.org/jira/browse/HDFS-7521 provide asynchronous 
notification so replication can start right away.

* We can also make dfsadmin -refreshNodes asynchronously for recommission. 
But it doesn't have be done by this jira. We can let other jiras handle this.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250854#comment-14250854
 ] 

Colin Patrick McCabe commented on HDFS-7411:


* Can you rebase this on trunk?  ChunkedArrayList has moved and this caused a
patch application failure.

* I would really prefer that size stay a O(1) operation for ChunkedArrayList.
We should be able to do this by hooking into the iterator's remove() method,
creating a custom iterator if needed.  If that's too complex to do in this
jira, then let's at least file a follow-on.

{code}
property
  namedfs.namenode.decommission.blocks.per.node/name
  value40/value
  descriptionThe approximate number of blocks per node. 
  This affects the number of blocks processed per 
  decommission interval, as defined in 
dfs.namenode.decommission.interval.
  This is multiplied by dfs.namenode.decommission.nodes.per.interval
  to define the actual processing rate./description
/property
{code}

* Why do we need this parameter?  The NameNode already tracks how many blocks
each DataNode has in each storage.  That information is in
DatanodeStorageInfo#size.

{code}
property
  namedfs.namenode.decommission.max.concurrent.tracked.nodes/name
  value100/value
  description
The maximum number of decommission-in-progress datanodes nodes that 
will be
tracked at one time by the namenode. Tracking a decommission-in-progress
datanode consumes additional NN memory proportional to the number of 
blocks
on the datnode. Having a conservative limit reduces the potential impact
of decomissioning a large number of nodes at once.
  /description
/property
{code}

* Should this be called something like
dfs.namenode.decomission.max.concurrent.nodes?  I'm confused by the mention of
tracking here.  It seems to imply that setting this too low would allow more
nodes to be decommissioned, but we'd stop tracking the decomissioning?

{code}
-  static final Log LOG = LogFactory.getLog(BlockManager.class);
+  static final Logger LOG = LoggerFactory.getLogger(BlockManager.class);
{code}
If you're going to change this, you need to change all the unit tests that are
changing the log level, so that they use the correct function to do so.

{code}
hdoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java:

((Log4JLogger)LogFactory.getLog(BlockManager.class)).getLogger().setLevel(Level.ALL);
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java:

((Log4JLogger)LogFactory.getLog(BlockManager.class)).getLogger().setLevel(Level.ALL);
and many more...
{code}
I can see that you fixed it in TestPendingInvalidateBlock.java, but there's a 
lot more locations.

You probably need something like the GenericTestUtils#disableLog function I
created in the HDFS-7430 patch.  I guess we could split that off into a
separate patch if it's important enough.  Or perhaps we could just put off
changing this until a follow-on JIRA?

{code}
if (node.isAlive) {
  return true;
} else {
  ... long block ...
}
{code}
We can reduce the indentation by getting rid of the else block here.  Similar
with the other nested 'else'.

{code}
-  LOG.fatal(ReplicationMonitor thread received Runtime exception. , 
t);
+  LOG.error(ReplicationMonitor thread received Runtime exception. ,
+  t);
{code}
What's the rationale for changing the log level here?

{code}
   /**
-   * Decommission the node if it is in exclude list.
+   * Decommission the node if it is in the host exclude list.
+   *
+   * @param nodeReg datanode
*/
-  private void checkDecommissioning(DatanodeDescriptor nodeReg) { 
+  void checkDecommissioning(DatanodeDescriptor nodeReg) {
{code}

I realize this isn't introduced by this patch, but this function seems
misleadingly named.  Perhaps it should be named something like
startDecomissioningIfExcluded?  It's definitely not just a check.

more comments coming...

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248953#comment-14248953
 ] 

Colin Patrick McCabe commented on HDFS-7411:


Uh oh.  Buildbot has become self-aware, and is exercising its -1.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247540#comment-14247540
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687303/hdfs-7411.005.patch
  against trunk revision e597249.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9040//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9040//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247558#comment-14247558
 ] 

Andrew Wang commented on HDFS-7411:
---

This is a first, Jenkins says everything is +1, but gives a -1 overall. 
Buildbot must have it out for me :)

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-15 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247563#comment-14247563
 ] 

Ravi Prakash commented on HDFS-7411:


Lolz! I'm wondering if the jenkins jobs are running in isolated workspaces at 
all. It'd explain a lot

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-12 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244625#comment-14244625
 ] 

Arpit Agarwal commented on HDFS-7411:
-

Hi Andrew, thank you for the heads up.

Looks like the change to {{addStoredBlockUnderConstruction}} is not necessary. 
The function was updated by HDFS-2832 when we added the concept of datanode as 
a collection of storages. Looking at the pre-2832 code:

{code}
block.addReplicaIfNotPresent(node, block, reportedState);
if (reportedState == ReplicaState.FINALIZED  block.findDatanode(node)  
0) {
  addStoredBlock(block, node, null, true);
}
{code}

And in {{addStoredBlock}} we have:

{code}
...
// add block to the datanode
boolean added = node.addBlock(storageID, storedBlock);
{code}

I am not sure what was the original reason for not adding UC blocks to the 
DataNode/storage but if nothing is broken right now perhaps we should leave it 
as it is. One more side effect of this change is that the subsequent if block 
will never be taken as block.findDatanode will always return true.


 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-12 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244641#comment-14244641
 ] 

Andrew Wang commented on HDFS-7411:
---

Hey Arpit,

So what's broken right now is that these UC blocks aren't picked up by decom, 
since decom scans the DN block list. This means these DNs can be erroneously 
decommissioned. I'm not sure how to resolve this without adding the UC blocks 
to the DN block list. Good point about that second if statement too.

I guess I'll dig more on this myself; if anyone else knows about this, a 
comment would be appreciated. At a logical level, I'm not sure why we'd have 
the DN locations in the blocksMap, but not also add the block to the DN's block 
list.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-12 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244728#comment-14244728
 ] 

Arpit Agarwal commented on HDFS-7411:
-

bq. So what's broken right now is that these UC blocks aren't picked up by 
decom, since decom scans the DN block list. This means these DNs can be 
erroneously decommissioned. 

That makes sense. Yes logically it makes sense to include UC blocks in the DN 
block list. Perhaps it's safe to do so in addStoredBlockUnderConstruction if we 
can make check it doesn't break any assumption elsewhere in BlockManager.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243733#comment-14243733
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12686745/hdfs-7411.004.patch
  against trunk revision 5b9fced.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9018//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9018//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9018//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-05 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236549#comment-14236549
 ] 

Ming Ma commented on HDFS-7411:
---

Andrew, nice work. It appears I don't need to continue the work on 
https://issues.apache.org/jira/browse/HDFS-7442. Some initial comments.

1. NN Memory impact on the additional decomNodeBlocks. It shouldn't be an issue 
given admins won't decommission lots of nodes at the same time. But it might be 
worth calling out some limit here. 100 nodes * 400k blocks per node * 8 bytes 
per blockInfo reference, 320MB extra at the start of the decommission process?
2. It appears dfs.namenode.decommission.blocks.per.node description should 
refer to dfs.namenode.decommission.nodes.per.interval instead.
3. It appears this patch also fixed 
https://issues.apache.org/jira/browse/HDFS-5757 by calling decomNodeBlocks.put 
during refreshNodes.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2014-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236615#comment-14236615
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685506/hdfs-7411.003.patch
  against trunk revision 0707e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
  
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8938//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8938//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)