[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster
[ https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2359: Priority: Blocker (was: Critical) Deadlock in DAGAppMaster Key: TEZ-2359 URL: https://issues.apache.org/jira/browse/TEZ-2359 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Priority: Blocker {code} Found one Java-level deadlock: = Timer-1: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Dispatcher thread: Central: waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a org.apache.tez.dag.app.rm.YarnTaskSchedulerService), which is held by DelayedContainerManager DelayedContainerManager: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Java stack information for the threads listed above: === Timer-1: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015) - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Dispatcher thread: Central: at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842) - waiting to lock 0x0007cd5ab958 (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService) at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566) at org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832) at org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) - locked 0x0007cd1d0208 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) DelayedContainerManager: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531) at
[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster
[ https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2359: Target Version/s: 0.7.0 Deadlock in DAGAppMaster Key: TEZ-2359 URL: https://issues.apache.org/jira/browse/TEZ-2359 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Priority: Blocker {code} Found one Java-level deadlock: = Timer-1: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Dispatcher thread: Central: waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a org.apache.tez.dag.app.rm.YarnTaskSchedulerService), which is held by DelayedContainerManager DelayedContainerManager: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Java stack information for the threads listed above: === Timer-1: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015) - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Dispatcher thread: Central: at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842) - waiting to lock 0x0007cd5ab958 (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService) at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566) at org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832) at org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) - locked 0x0007cd1d0208 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) DelayedContainerManager: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531) at
[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster
[ https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2359: Priority: Critical (was: Major) Deadlock in DAGAppMaster Key: TEZ-2359 URL: https://issues.apache.org/jira/browse/TEZ-2359 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Priority: Critical {code} Found one Java-level deadlock: = Timer-1: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Dispatcher thread: Central: waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a org.apache.tez.dag.app.rm.YarnTaskSchedulerService), which is held by DelayedContainerManager DelayedContainerManager: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Java stack information for the threads listed above: === Timer-1: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015) - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Dispatcher thread: Central: at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842) - waiting to lock 0x0007cd5ab958 (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService) at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566) at org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832) at org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) - locked 0x0007cd1d0208 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) DelayedContainerManager: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531) at