Brook Zhou created YARN-4677:
--------------------------------

             Summary: RMNodeResourceUpdateEvent update from scheduler can lead 
to race condition
                 Key: YARN-4677
                 URL: https://issues.apache.org/jira/browse/YARN-4677
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: graceful, resourcemanager, scheduler
    Affects Versions: 2.7.1
            Reporter: Brook Zhou


When a node is in decommissioning state, there is time window between 
completedContainer() and RMNodeResourceUpdateEvent get handled in 
scheduler.nodeUpdate (YARN-3223). 

So if a scheduling effort happens within this window, the new container could 
still get allocated on this node. Even worse case is if scheduling effort 
happen after RMNodeResourceUpdateEvent sent out but before it is propagated to 
SchedulerNode - then the total resource is lower than used resource and 
available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to