Jian He commented on YARN-4138:

I think it may be true that this will lead to dead lock.
- CapacityScheduler#allocateContainersToNode will grab scheduler lock and then 
SchedulerApp's lock at LeafQueue#assignContainers.
- CapacityScheduler#rollbackContainerResource first acquires SchedulerApp's 
lock and then scheduler lock.  
-- This will also happen when AM calls CapacityScheduler#allocate to decrease 
the container. This is introduced in YARN-1651. I had a 
  earlier that every AM allocate call will hold scheduler and queue's 
lock,which is too expensive, but missed that this may lead to deadlock. 

> Roll back container resource allocation after resource increase token expires
> -----------------------------------------------------------------------------
>                 Key: YARN-4138
>                 URL: https://issues.apache.org/jira/browse/YARN-4138
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, nodemanager, resourcemanager
>            Reporter: MENG DING
>            Assignee: MENG DING
>         Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.

This message was sent by Atlassian JIRA

Reply via email to