MENG DING commented on YARN-4519:

I feel that the correct solution would be simply put all decrease requests into 
a pendingDecrease list in the allocate() call (after some initial sanity 
checks, of course). And in the allocateContainersToNode() call, process all the 
pendingDecrease requests first before allocating new/increase resource. This 
would make it easy for the resource rollback too.

Also, the following code may have issues?
// Pre-process increase requests
    List<SchedContainerChangeRequest> normalizedIncreaseRequests =
        checkAndNormalizeContainerChangeRequests(increaseRequests, true);

    // Pre-process decrease requests
    List<SchedContainerChangeRequest> normalizedDecreaseRequests =
        checkAndNormalizeContainerChangeRequests(decreaseRequests, false);
There could be race conditions when calculating the delta resource for the 
SchedContainerchangeRequest, since the above code is not synchronized with the 

Thoughts, [~leftnoteasy]?

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> ----------------------------------------------------------------------------------------
>                 Key: YARN-4519
>                 URL: https://issues.apache.org/jira/browse/YARN-4519
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: sandflee
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 

This message was sent by Atlassian JIRA

Reply via email to