[jira] [Commented] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

Jian He (JIRA) Thu, 20 Feb 2014 14:48:06 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907654#comment-13907654
 ]


Jian He commented on YARN-1071:
-------------------------------

Thanks zhijie for the review ! 
bq. HostsFileReader#refresh(2params)
That's hadoop-common code, we should probably not touch it.
bq. Check the ip as well as we do in NodesListManager#isValidNode?
good catch!
Fixed other comments also.

The patch doesn't fix the include list scenario and changing exclude list 
between rm restarts. For that, rm may need to persistently save the 
decomissionNM state

> ResourceManager's decommissioned and lost node count is 0 after restart
> -----------------------------------------------------------------------
>
>                 Key: YARN-1071
>                 URL: https://issues.apache.org/jira/browse/YARN-1071
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Srimanth Gunturi
>            Assignee: Jian He
>         Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, 
> YARN-1071.4.patch
>
>
> I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's 
> {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin 
> -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 1,
> "NumLostNMs" : 2,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> After restarting RM, the counts were shown as below in JMX.
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 0,
> "NumLostNMs" : 0,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart

Reply via email to