[
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257315#comment-15257315
]
Daniel Zhi commented on YARN-4676:
----------------------------------
1. For client-side timeout tracking, I assume you are talking about the
"private int refreshNodes(long timeout)" method in RMAdminCLI.java, where the
code continuously (every second) checks and waits for all decommissioning nodes
to become decommissioned. For any remaining decommissioning nodes upon timeout,
the client will send FORCEFUL decommission request. The code remains mostly the
same except as RM also got the timeout (through RefreshNodesRequest()) and
enforces such timeout, normally the client won't need to explicitly invoke
FORCEFUL decommission as nodes will become DECOMMISSIONED by then. (Should
server for some reason didn't turn the node into DECOMMISSIONED, client will
force it). Does the combined behavior appear fine to you?
2. I am not very familiar with YARN internals when
yarn.resourcemanager.recovery.enabled is true. My understanding of current (pre
YARN-4676) behavior is: when RM restarts, NodesListManager creates a pseudo
RMNodeImpl for excluded node and DECOMMISSION the node right away. Further, any
invalid node will be rejected and told to SHUTDOWN inside
registerNodeManager(). So when recovery is enabled, and RM restart during
DECOMMISSIONING, although applications and containers are likely resumed,
DECOMMISSIONING nodes will be DECOMMISSIONed right away.
RM state store does not appear to serialize and restore RmNode but instead,
after RM restart, RESYNC is replied in nodeHeartbeat() and new RmNode is
created in following registerNodeManager(). So decommissioning start time get
lost. To possibly resume the DECOMMISSIONING nodes, the decommissioning start
time, possibly the DECOMMISSIONING state need to be stored and restored.
I am not very familiar with RM state store but it appears non-trivial work
involved, the cost-benefits justifications would also depend on how essential
to resume DECOMMISSIONING nodes after RM restart. And if so whether it's better
to create and handle it in a separate task/JIRA.
> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Zhi
> Assignee: Daniel Zhi
> Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch,
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch,
> YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch,
> YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks
> DECOMMISSIONING nodes status automatically and asynchronously after
> client/admin made the graceful decommission request. It tracks
> DECOMMISSIONING nodes status to decide when, after all running containers on
> the node have completed, will be transitioned into DECOMMISSIONED state.
> NodesListManager detect and handle include and exclude list changes to kick
> out decommission or recommission as necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)