[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210694#comment-15210694
 ] 

Daniel Zhi commented on YARN-4676:
----------------------------------

Thanks. I will update patch afterward. Here are quick responses:

1. I will look at unit tests. (Once all the tests reported as failed by Hadoop 
QA actually PASS on my local machine without or without my patch).
3. At least in AWS EMR cluster, all Hadoop daemons are configured to restart 
automatically if stopped. So NodeManager, upon told to shutdown, will exit, but 
then immediately restarted and try to register itself to RM. Should the node be 
RECOMMISSIONed later, it will be accepted and become a normal node. While the 
node remains as DECOMMISSIONED, such shutdown-restart loop will keep going 
until the node is either terminated or be recommissioned. The 5 second wait is 
to avoid such loop become too tight (1~2 second).
4. I will remove one
5. The particular "} else if (exclude) {" is to avoid the "No action ..." log 
message for a RUNNING node that was not excluded. The "} else {" block 
corresponding to "if (graceful) {" covers non-graceful case.
   if (graceful) {
     ......
   } else {
     ......
   }
6. will do
7. yarn.resourcemanager.decommissioning.timeout is the key for timeout, the 
default value is declared as YarnConfiguration.DEFAULT_DECOMMISSIONING_TIMEOUT.
8. will figure out what to add/update.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, 
> YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, 
> YARN-4676.008.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to