[
https://issues.apache.org/jira/browse/YARN-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420718#comment-17420718
]
Prabhu Joseph commented on YARN-10896:
--------------------------------------
Thanks [~Sushil-K-S] for the patch.
{code} assertEquals(2, rm.getRMContext().getInactiveRMNodes().size()); {code}
1. Why it returns 2, there are only one Inactive node present right?
> RM fail over is not reporting the nodes DECOMMISSIONED
> -------------------------------------------------------
>
> Key: YARN-10896
> URL: https://issues.apache.org/jira/browse/YARN-10896
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Sushil Ks
> Assignee: Sushil Ks
> Priority: Major
> Attachments: YARN-10896.001.patch
>
>
> Whenever we add the host entries into the exclude file in order to
> DECOMMISSION the Nodemanager, we would issue the *yarn rmadmin -refreshNodes*
> command to transition the nodes from RUNNING to DECOMMISSIONED state. However
> if the fail over to standby resource manager happens and the exclude file has
> the list of hosts to be disallowed, then these disallowed nodes are never
> seen through the Cluster Metrics on the new active resource manager.
> Whatever host entries that are present in the exclude files are being listed
> in the Cluster Metrics whenever resource manager is restarted, i.e as part of
> the service init of *NodeListManager* , however during fail over this info is
> lost. Hence this patch tries to set the *DECOMMISSIONED* nodes inside the RM
> Context so that its available through Cluster Metrics whenever we issue the
> *yarn rmadmin -refreshNodes* command.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]