[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210625#comment-15210625 ]
Robert Kanter commented on YARN-4676: ------------------------------------- I'm still looking through the patch, especially the DecommissioningNodesWatcher stuff (which is where the bulk of the changes are), but here's some early feedback in the meantime: # Please look into and fix the failed unit tests # Don't bother deleting that empty line in ClusterMetrics and NodeRemovedSchedulerEvent. There's no other changes in those files, so it's just noise in the patch # Can you give me more details on why the 5sec wait in NodeManager is needed? # In NodesListManager#handleExcludeNodeList, I don't think we need to allocate two decom lists. Only one of them is ever used at a time, and there's already a boolean, {{graceful}}, to indicate if it should be graceful or not. # In NodesListManager#handleExcludeNodeList, should {code:java} } else if (exclude) { LOG.info("No action for node " + n.getNodeID() + " with state " + s); } {code} just be an {{else}} statement? And should we have a similar one for the non-graceful section? Otherwise, some nodes will fall through here and we don't log anything for them # Need to update the comment on line 73 of RMServerUtils # {{yarn.resourcemanager.decommissioning.timeout}} is the default timeout, right? For when no timeout is specified on the CLI. The description and property name should reflect that. Something like {{yarn.resourcemanager.decommissioning.default.timeout}} # Add/Update docs > Automatic and Asynchronous Decommissioning Nodes Status Tracking > ---------------------------------------------------------------- > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Daniel Zhi > Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)