Brook Zhou created YARN-4677:
--------------------------------
Summary: RMNodeResourceUpdateEvent update from scheduler can lead
to race condition
Key: YARN-4677
URL: https://issues.apache.org/jira/browse/YARN-4677
Project: Hadoop YARN
Issue Type: Improvement
Components: graceful, resourcemanager, scheduler
Affects Versions: 2.7.1
Reporter: Brook Zhou
When a node is in decommissioning state, there is time window between
completedContainer() and RMNodeResourceUpdateEvent get handled in
scheduler.nodeUpdate (YARN-3223).
So if a scheduling effort happens within this window, the new container could
still get allocated on this node. Even worse case is if scheduling effort
happen after RMNodeResourceUpdateEvent sent out but before it is propagated to
SchedulerNode - then the total resource is lower than used resource and
available resource is a negative value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)