[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4311:
------------------------------
    Attachment: YARN-4311-v2.patch

This patch addresses graceful and other versions of refreshNodes and also adds 
a time stamp based check for nodes per {{RM_NODE_REMOVAL_CHK_INTERVAL_MSEC}} in 
the inactive list that should be untracked and removes nodes based on 
{{RM_NODE_REMOVAL_TIMEOUT_MSEC}}. A decommissioned node is not transitioned to 
shutdown but timer acts on it just as it would on a shutdown node.

A decommissioning node will transition to shutdown if it was found to be 
'untracked'. 

The unit test tries out several scenarios to check if the metrics and node 
lists are proper. I can break it into more tests if the idea behind it looks 
acceptable.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4311
>                 URL: https://issues.apache.org/jira/browse/YARN-4311
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to