Sunil G commented on YARN-4386:

Hi [~kshukla]
Sorry for replying late here. 
bq. Unless there are 2 refreshNodes done in parallel such that the first 
deactivateNodeTransition has not finished and the other refreshNodes is also 
trying to do the same transition
Since the transitions are happening under write lock, this may not happen.

I have one suggestion here.
I feel You could mark a node for GRACEFUL DECOMMISSION and ensure that node is 
in DECOMMISSIONING state. (can try to fire event to RMNodeImpl directly to do 
this). Later invoke {{refreshNodesGracefully}} and verify that an event named 
RECOMMISSION is raised to dispatcher or not. Similarly mark a node as 
DECOMMISSIONED and then  invoke {{refreshNodesGracefully}} and verify the event 
RECOMMISSION is *NOT* raised. In second case, it will not enter *for* loop. but 
I feel this will clear cover our case here though its not direct.
Pls correct me if I am wrong.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> ---------------------------------------------------------------------------------------------
>                 Key: YARN-4386
>                 URL: https://issues.apache.org/jira/browse/YARN-4386
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: graceful
>    Affects Versions: 3.0.0
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Minor
>         Attachments: YARN-4386-v1.patch
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry<NodeId, RMNode> entry:rmContext.getRMNodes().entrySet()) { 
> .........................
>  // Recommissioning the nodes
>         if (entry.getValue().getState() == NodeState.DECOMMISSIONING
>             || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>           this.rmContext.getDispatcher().getEventHandler()
>               .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
>         }
> {code}

This message was sent by Atlassian JIRA

Reply via email to