[
https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907654#comment-13907654
]
Jian He commented on YARN-1071:
-------------------------------
Thanks zhijie for the review !
bq. HostsFileReader#refresh(2params)
That's hadoop-common code, we should probably not touch it.
bq. Check the ip as well as we do in NodesListManager#isValidNode?
good catch!
Fixed other comments also.
The patch doesn't fix the include list scenario and changing exclude list
between rm restarts. For that, rm may need to persistently save the
decomissionNM state
> ResourceManager's decommissioned and lost node count is 0 after restart
> -----------------------------------------------------------------------
>
> Key: YARN-1071
> URL: https://issues.apache.org/jira/browse/YARN-1071
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Srimanth Gunturi
> Assignee: Jian He
> Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch,
> YARN-1071.4.patch
>
>
> I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's
> {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin
> -refreshNodes}}, RM's JMX correctly showed decommissioned node count:
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 1,
> "NumLostNMs" : 2,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> After restarting RM, the counts were shown as below in JMX.
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 0,
> "NumLostNMs" : 0,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> Notice that the lost and decommissioned NM counts are both 0.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)