[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210625#comment-15210625
 ] 

Robert Kanter commented on YARN-4676:
-------------------------------------

I'm still looking through the patch, especially the DecommissioningNodesWatcher 
stuff (which is where the bulk of the changes are), but here's some early 
feedback in the meantime:
# Please look into and fix the failed unit tests
# Don't bother deleting that empty line in ClusterMetrics and 
NodeRemovedSchedulerEvent.  There's no other changes in those files, so it's 
just noise in the patch
# Can you give me more details on why the 5sec wait in NodeManager is needed?
# In NodesListManager#handleExcludeNodeList, I don't think we need to allocate 
two decom lists.  Only one of them is ever used at a time, and there's already 
a boolean, {{graceful}}, to indicate if it should be graceful or not.
# In NodesListManager#handleExcludeNodeList, should
{code:java}
        } else if (exclude) {
          LOG.info("No action for node " + n.getNodeID() + " with state " + s);
        }
{code}
just be an {{else}} statement?  And should we have a similar one for the 
non-graceful section?  Otherwise, some nodes will fall through here and we 
don't log anything for them
# Need to update the comment on line 73 of RMServerUtils
# {{yarn.resourcemanager.decommissioning.timeout}} is the default timeout, 
right?  For when no timeout is specified on the CLI.  The description and 
property name should reflect that.  Something like 
{{yarn.resourcemanager.decommissioning.default.timeout}}
# Add/Update docs

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to