[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235307#comment-15235307
 ] 

Kuhu Shukla commented on YARN-4311:
-----------------------------------

The problem does exist. In a scenario where the node being removed from the 
list is already in REBOOTED, LOST state then node metrics would go out of sync. 
There are 2 approaches to solve this:
1. Have the appropriate metrics decremented and let the timer remove such nodes.
2. Have the timer remove only nodes that are DECOMMISSIONED.

Both LOST and REBOOTED states are self arcs to themselves in the Finite State 
Machine, just like DECOMMISSIONED nodes, so that removes some complexity if we 
go with option 1.

[~jlowe], Requesting you for your comments on the 2 approaches and which one 
sounds better.

Will open a follow up JIRA shortly.


> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4311
>                 URL: https://issues.apache.org/jira/browse/YARN-4311
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>             Fix For: 2.8.0
>
>         Attachments: YARN-4311-branch-2.7.001.patch, 
> YARN-4311-branch-2.7.002.patch, YARN-4311-branch-2.7.003.patch, 
> YARN-4311-branch-2.7.004.patch, YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, 
> YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, 
> YARN-4311-v2.patch, YARN-4311-v3.patch, YARN-4311-v4.patch, 
> YARN-4311-v5.patch, YARN-4311-v6.patch, YARN-4311-v7.patch, 
> YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to