[
https://issues.apache.org/jira/browse/YARN-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sushil Ks updated YARN-10896:
-----------------------------
Summary: RM fail over is not reporting the nodes DECOMMISSIONED (was: RM
fail over would not report the DECOMMISSIONED nodes )
> RM fail over is not reporting the nodes DECOMMISSIONED
> -------------------------------------------------------
>
> Key: YARN-10896
> URL: https://issues.apache.org/jira/browse/YARN-10896
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Sushil Ks
> Assignee: Sushil Ks
> Priority: Major
>
> Whenever we add the host entries into the exclude file in order to
> DECOMMISSION the Nodemanager, we would issue the *yarn rmadmin -refreshNodes*
> command to transition the nodes from RUNNING to DECOMMISSIONED state. However
> if the fail over to standby resource manager happens and the exclude file has
> the list of hosts to be disallowed, then these disallowed nodes are never
> seen through the Cluster Metrics on the new active resource manager.
> Whatever host entries that are present in the exclude files are being listed
> in the Cluster Metrics whenever resource manager is restarted, i.e as part of
> the service init of *NodeListManager* , however during fail over this info is
> lost. Hence this patch tries to set the ** DECOMMISSIONED nodes inside the
> RM Context so that its available through Cluster Metrics whenever we issue
> the *yarn rmadmin -refreshNodes* command.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]