[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432311#comment-17432311 ] Anton Ippolitov commented on YARN-10492: [~Jufeng] could you share the patch please? What was the exact root cause? > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428325#comment-17428325 ] Vincent Barbaresi commented on YARN-10492: -- We are having a similar issue on Spark 3.1.2 (on Google Cloud Dataproc). Though it happens with the Fair Scheduler. It always happens with the first application in the cluster lifetime: it gets stuck in ACCEPTED state. It seems to be holding resources and prevents the other application from running. After restarting the resourcemanager, the stuck application is able to start running and everything goes back to normal. I attached a thread dump, hoping it can help. We also enabled DEBUG logs on the resourcemanager, so next time it happens we'll probably have more information. > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at >
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235833#comment-17235833 ] Wangda Tan commented on YARN-10492: --- That will be helpful, thanks Jufeng! > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235828#comment-17235828 ] jufeng li commented on YARN-10492: -- We are using the same version(HDP-3.1.5.0-152 hadoop3.1),and we got the same issue,I solved this issue,do you want patch? > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234230#comment-17234230 ] Peter Bacsko commented on YARN-10492: - We need the full thread dump. We can see one thread having "AllocateResponseLock" and another waiting for it, but we don't see what holds "0x0003c0d0d6c0". > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233880#comment-17233880 ] Wangda Tan commented on YARN-10492: --- cc: [~snemeth] , [~pbacsko], [~tangzhankun], [~sunil.gov...@gmail.com]. > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1 >Reporter: brick yang >Priority: Critical > Labels: 3.1.1 > > version: HDP-3.1.5.0-152 hadoop3.1 > capacity scheduler > yarn sometimes not change to active > we found that jstack dump has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at
[jira] [Commented] (YARN-10492) deadlock in rm
[ https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232675#comment-17232675 ] brick yang commented on YARN-10492: --- please help me!!! thank you! > deadlock in rm > --- > > Key: YARN-10492 > URL: https://issues.apache.org/jira/browse/YARN-10492 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.4 >Reporter: brick yang >Priority: Critical > > version: HDP-3.1.5.0-152 hadoop3.1 > yarn sometimes not change to active > we found that jstak has deadlocked: > "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 > tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry > [0x7fee09633000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323) > - waiting to lock <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > > > > > > > > "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 > tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0003c0d0d6c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > - locked <0x00043e2e19d0> (a > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >