[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695578#comment-15695578
 ] 

zhengchenyu edited comment on YARN-4090 at 11/25/16 11:25 AM:
--------------------------------------------------------------

here we see a dead block: 
"IPC Server handler 98 on 8032" is waiting for lock (0x00007f42e17a5ed8)
"IPC Server handler 76 on 8032" got the lock (0x00007f42e17a5ed8), is is 
waiting for lock (0x00007f42df3e8450)
"ResourceManager Event Processor" got the lock (0x00007f42df3e8450),is waiting 
for lock (0x00007f42e17a5ed8)

In fact, 0x00007f42e17a5ed8 is a object lock of FSParentQueue, here I called 
this root.Parent.
0x00007f42df3e8450 is another object lock of FSParentQueue, this is the child 
queue object of 0x00007f42e17a5ed8. here I called this root.Parent.Child.

Let's trace these thread.
(1) ResourceManager Event Processor
{code}
FairScheduler.handle
  FairScheduler.nodeUpdate
    FairScheduler.completedContainer
      FSAppAttempt.containerCompleted
        FSLeafQueue.decResourceUsage
         //got the lock 0x00007f42e0c7cf50                              
          FSParentQueue.decResourceUsage                                
           //got the lock 0x00007f42df3e8450 which is the object lock of 
root.Parent.Child
            FSParentQueue.decResourceUsage                              
             //wait for 0x00007f42e17a5ed8 which is the object lock of 
root.Parent
{code}
(2) IPC Server handler 76 on 8032
{code}
ClientRMService.getQueueUserAcls
  FairScheduler.getQueueUserAclInfo
    FSParentQueue.getQueueUserAclInfo
     //got the lock 0x00007f42e17a5ed8
      FSParentQueue.getQueueUserAclInfo
       //wait for the lock 0x00007f42df3e8450
{code}
                                        
The left thread is unnecessary to analyse. Here we can see decResourceUsage got 
the object lock from bottom to top, but getQueueUserAcls got the object lock 
from top to bottom.
getQueueUserAcls got the object lock of root and root.Parent, and waits for 
root.Parent.Child. But decResourceUsage got the object lock of 
root.Parent.Child, and waits for root.Parnt. That's a deadlock.
I recommend that decResourceUsage is rewriten with the way of getting the 
object lock from top to bottom. Another way is that choosing ReadWriteLock the 
take the place of object lock


was (Author: zhengchenyu):
here we see a dead block: 
"IPC Server handler 98 on 8032" is waiting for lock (0x00007f42e17a5ed8)
"IPC Server handler 76 on 8032" got the lock (0x00007f42e17a5ed8), is is 
waiting for lock (0x00007f42df3e8450)
"ResourceManager Event Processor" got the lock (0x00007f42df3e8450),is waiting 
for lock (0x00007f42e17a5ed8)

In fact, 0x00007f42e17a5ed8 is a object lock of FSParentQueue, here I called 
this root.Parent.
0x00007f42df3e8450 is another object lock of FSParentQueue, this is the child 
queue object of 0x00007f42e17a5ed8. here I called this root.Parent.Child.

Let's trace these thread.
(1) ResourceManager Event Processor
{code}
FairScheduler.handle
  FairScheduler.nodeUpdate
    FairScheduler.completedContainer
      FSAppAttempt.containerCompleted
        FSLeafQueue.decResourceUsage
         //got the lock 0x00007f42e0c7cf50                              
          FSParentQueue.decResourceUsage                                
           //got the lock 0x00007f42df3e8450 which is the object lock of 
root.Parent.Child
            FSParentQueue.decResourceUsage                              
             //wait for 0x00007f42e17a5ed8 which is the object lock of 
root.Parent
{code}
(2) IPC Server handler 76 on 8032
{code}
ClientRMService.getQueueUserAcls
  FairScheduler.getQueueUserAclInfo
    FSParentQueue.getQueueUserAclInfo
     //got the lock 0x00007f42e17a5ed8
      FSParentQueue.getQueueUserAclInfo
       //wait for the lock 0x00007f42df3e8450
{code}
                                        
The left thread is unnecessary to analyse. Here we can see decResourceUsage got 
the object lock from bottom to top, but getQueueUserAcls got the object lock 
from top to bottom.
getQueueUserAcls got the object lock of root and root.Parent, and waits for 
root.Parent.Child. But decResourceUsage got the object lock of 
root.Parent.Child, and waits for root.Parnt. That's a deadlock.
I recommend that decResourceUsage is rewriten with the way of getting the 
object lock from top to bottom.

> Make Collections.sort() more efficient in FSParentQueue.java
> ------------------------------------------------------------
>
>                 Key: YARN-4090
>                 URL: https://issues.apache.org/jira/browse/YARN-4090
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>            Reporter: Xianyin Xin
>            Assignee: Xianyin Xin
>         Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, 
> sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to