[ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907857#comment-13907857
 ] 

Zhijie Shen commented on YARN-1071:
-----------------------------------

I thought about another scenario:

a. host1 and host2 are in the exclude list
b. refresh node, count = 2
c. host1 starts again, count = 1
d. rm stops
e. rm starts
d. count = 2 after NodesListManager inits
e. count =1 after host1 reconnected

Here, the decommission count decrease will be eventually reflected after rm 
restarts. So this scenario should still be covered with this approach. Correct 
me if I'm wrong about the process.

Other than that, I'm general fine with patch except that the temp dir created 
for test is good to be deleted after test completion.
{code}
+  private final static File TEMP_DIR = new File(System.getProperty(
+    "test.build.data", "/tmp"), "decommision");
{code}



> ResourceManager's decommissioned and lost node count is 0 after restart
> -----------------------------------------------------------------------
>
>                 Key: YARN-1071
>                 URL: https://issues.apache.org/jira/browse/YARN-1071
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Srimanth Gunturi
>            Assignee: Jian He
>         Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, 
> YARN-1071.4.patch, YARN-1071.5.patch
>
>
> I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
> {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
> -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 1,
> "NumLostNMs" : 2,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> After restarting RM, the counts were shown as below in JMX.
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 0,
> "NumLostNMs" : 0,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to