Wangda Tan commented on YARN-4519:

Thanks [~jianhe] found this issue and analysis from [~sandflee]/[~mding].

I think the simplest solution could be, move 
     // Decrease containers
      decreaseContainers(normalizedDecreaseRequests, application);
Out of the synchronized lock of application:
    synchronized (application) {
   // put it here.

And also, in {{AbstractYarnScheduler#decreaseContainers}},
It's better to move 
      boolean hasIncreaseRequest =
              request.getPriority(), request.getContainerId());
Into {{decreaseContainer}}.

After above changes, decrease a container needs to acquire CS lock first. And 
YARN-4136 can directly use {{decreaseContainer}} to rolllback container.


> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> ----------------------------------------------------------------------------------------
>                 Key: YARN-4519
>                 URL: https://issues.apache.org/jira/browse/YARN-4519
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: sandflee
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 

This message was sent by Atlassian JIRA

Reply via email to