[
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695182#comment-15695182
]
zhangyubiao commented on YARN-4090:
-----------------------------------
Found one Java-level deadlock:
=============================
"IPC Server handler 98 on 8032":
waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
which is held by "IPC Server handler 76 on 8032"
"IPC Server handler 76 on 8032":
waiting to lock monitor 0x00007f4e388b94f8 (object 0x00007f42df3e8450, a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
which is held by "ResourceManager Event Processor"
"ResourceManager Event Processor":
waiting to lock monitor 0x00007f4e48b1f808 (object 0x00007f42e17a5ed8, a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue),
which is held by "IPC Server handler 76 on 8032"
Java stack information for the threads listed above:
===================================================
"IPC Server handler 98 on 8032":
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
- waiting to lock <0x00007f42e17a5ed8> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
at
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
at
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"IPC Server handler 76 on 8032":
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:149)
- waiting to lock <0x00007f42df3e8450> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:156)
- locked <0x00007f42e17a5ed8> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1468)
at
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:903)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:280)
at
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:431)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"ResourceManager Event Processor":
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:307)
- waiting to lock <0x00007f42e17a5ed8> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
- locked <0x00007f42df3e8450> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.decResourceUsage(FSQueue.java:309)
- locked <0x00007f42e0c7cf50> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.containerCompleted(FSAppAttempt.java:157)
- locked <0x00007f42deaf9aa8> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:829)
- eliminated <0x00007f42deaf8288> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:984)
- locked <0x00007f42deaf8288> (a
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1195)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:121)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
at java.lang.Thread.run(Thread.java:745)
Found 1 deadlock.
> Make Collections.sort() more efficient in FSParentQueue.java
> ------------------------------------------------------------
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: fairscheduler
> Reporter: Xianyin Xin
> Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch,
> YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg,
> sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]