[
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210625#comment-15210625
]
Robert Kanter commented on YARN-4676:
-------------------------------------
I'm still looking through the patch, especially the DecommissioningNodesWatcher
stuff (which is where the bulk of the changes are), but here's some early
feedback in the meantime:
# Please look into and fix the failed unit tests
# Don't bother deleting that empty line in ClusterMetrics and
NodeRemovedSchedulerEvent. There's no other changes in those files, so it's
just noise in the patch
# Can you give me more details on why the 5sec wait in NodeManager is needed?
# In NodesListManager#handleExcludeNodeList, I don't think we need to allocate
two decom lists. Only one of them is ever used at a time, and there's already
a boolean, {{graceful}}, to indicate if it should be graceful or not.
# In NodesListManager#handleExcludeNodeList, should
{code:java}
} else if (exclude) {
LOG.info("No action for node " + n.getNodeID() + " with state " + s);
}
{code}
just be an {{else}} statement? And should we have a similar one for the
non-graceful section? Otherwise, some nodes will fall through here and we
don't log anything for them
# Need to update the comment on line 73 of RMServerUtils
# {{yarn.resourcemanager.decommissioning.timeout}} is the default timeout,
right? For when no timeout is specified on the CLI. The description and
property name should reflect that. Something like
{{yarn.resourcemanager.decommissioning.default.timeout}}
# Add/Update docs
> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Zhi
> Assignee: Daniel Zhi
> Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch,
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch,
> YARN-4676.008.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks
> DECOMMISSIONING nodes status automatically and asynchronously after
> client/admin made the graceful decommission request. It tracks
> DECOMMISSIONING nodes status to decide when, after all running containers on
> the node have completed, will be transitioned into DECOMMISSIONED state.
> NodesListManager detect and handle include and exclude list changes to kick
> out decommission or recommission as necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)