[
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260441#comment-15260441
]
Karthik Kambatla commented on YARN-4676:
----------------------------------------
Haven't looked at the code itself, but looked at recent discussion around RM
restart and [~rkanter] filled me in on some of the details.
If RM work-preserving restart is not enabled, it should be okay to decommission
a node right away. If work-preserving restart is enabled and a node is
decommissioned with a timeout, it would be nice to store *when* the
decommission has been called and the timeout in the state-store. Note that, in
an HA setup, the two RMs could have a clock skew. Since that work is
non-trivial, I am open to doing it in a follow-up JIRA.
> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Zhi
> Assignee: Daniel Zhi
> Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch,
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch,
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch,
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks
> DECOMMISSIONING nodes status automatically and asynchronously after
> client/admin made the graceful decommission request. It tracks
> DECOMMISSIONING nodes status to decide when, after all running containers on
> the node have completed, will be transitioned into DECOMMISSIONED state.
> NodesListManager detect and handle include and exclude list changes to kick
> out decommission or recommission as necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)