[ 
https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482962#comment-16482962
 ] 

Kanwaljeet Sachdev commented on YARN-4677:
------------------------------------------

[~wilfreds], thanks for the patch and the context on it. The diffs look good. I 
guess just adding little more description that a NPE could occur because the 
heartbeat message might arrive after decommissioned along with stack trace will 
be good to have full context. The diffs look good, adding the trace will be 
beneficial in the Jira here.

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --------------------------------------------------------------------------
>
>                 Key: YARN-4677
>                 URL: https://issues.apache.org/jira/browse/YARN-4677
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful, resourcemanager, scheduler
>    Affects Versions: 2.7.1
>            Reporter: Brook Zhou
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-4677-branch-2.001.patch, 
> YARN-4677-branch-2.002.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between 
> completedContainer() and RMNodeResourceUpdateEvent get handled in 
> scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could 
> still get allocated on this node. Even worse case is if scheduling effort 
> happen after RMNodeResourceUpdateEvent sent out but before it is propagated 
> to SchedulerNode - then the total resource is lower than used resource and 
> available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to