[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695182#comment-15695182
 ] 

zhangyubiao commented on YARN-4090:
-----------------------------------

Found one Java-level deadlock:
=============================
"IPC Server handler 98 on 8032":
  waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "IPC Server handler 76 on 8032"
"IPC Server handler 76 on 8032":
  waiting to lock monitor 0x00007f4e388b94f8 (object 0x00007f42df3e8450, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "ResourceManager Event Processor"
"ResourceManager Event Processor":
  waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
  which is held by "IPC Server handler 76 on 8032"

Java stack information for the threads listed above:
===================================================
"IPC Server handler 98 on 8032":
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
        - waiting to lock <0x00007f42e17a5ed8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
        at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"IPC Server handler 76 on 8032":
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
        - waiting to lock <0x00007f42df3e8450> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:156)
        - locked <0x00007f42e17a5ed8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
        at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"ResourceManager Event Processor":
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:307)
        - waiting to lock <0x00007f42e17a5ed8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
        - locked <0x00007f42df3e8450> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
        - locked <0x00007f42e0c7cf50> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.containerCompleted(FSAppAttempt.java:157)
        - locked <0x00007f42deaf9aa8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:829)
        - eliminated <0x00007f42deaf8288> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:984)
        - locked <0x00007f42deaf8288> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1195)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:121)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
        at java.lang.Thread.run(Thread.java:745)

Found 1 deadlock.

> Make Collections.sort() more efficient in FSParentQueue.java
> ------------------------------------------------------------
>
>                 Key: YARN-4090
>                 URL: https://issues.apache.org/jira/browse/YARN-4090
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>            Reporter: Xianyin Xin
>            Assignee: Xianyin Xin
>         Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, 
> sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to