Tao Yang created YARN-6029:
------------------------------

             Summary: CapacityScheduler deadlock when 
ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that 
Thread_B calls LeafQueue#assignContainers to release a reserved container
                 Key: YARN-6029
                 URL: https://issues.apache.org/jira/browse/YARN-6029
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 2.8.0
            Reporter: Tao Yang
            Assignee: Tao Yang


When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
YarnClient#getQueueAclsInfo) just at the moment that LeafQueue#assignContainers 
is called and before notifying parent queue to release resource (should release 
a reserved container), then ResourceManager can deadlock. I found this problem 
on our testing environment for hadoop2.8.

Reproduce the deadlock in chronological order
* 1. Thread A (ResourceManager Event Processor) calls synchronized 
LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
* 2. Thread B (IPC Server handler) calls synchronized 
ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue root), 
iterates over children queue acls and is blocked when calling synchronized 
LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of queue root.a is 
hold by Thread A)
* 3. Thread A wants to inform the parent queue that a container is being 
completed and is blocked when invoking synchronized 
ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
queue root is hold by Thread B)

I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
removed to solve this problem, since this method appears to not affect fields 
of LeafQueue instance.

Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to