[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Priority: Blocker  (was: Critical)

 Deadlock in DAGAppMaster
 

 Key: TEZ-2359
 URL: https://issues.apache.org/jira/browse/TEZ-2359
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Priority: Blocker

 {code}
 Found one Java-level deadlock:
 =
 Timer-1:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Dispatcher thread: Central:
   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
   which is held by DelayedContainerManager
 DelayedContainerManager:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Java stack information for the threads listed above:
 ===
 Timer-1:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
   - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster)
   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 Dispatcher thread: Central:
   at 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
   - waiting to lock 0x0007cd5ab958 (a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
   at 
 org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
   at 
 org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   - locked 0x0007cd1d0208 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
   at java.lang.Thread.run(Thread.java:745)
 DelayedContainerManager:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
   at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
   at 
 

[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Target Version/s: 0.7.0

 Deadlock in DAGAppMaster
 

 Key: TEZ-2359
 URL: https://issues.apache.org/jira/browse/TEZ-2359
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Priority: Blocker

 {code}
 Found one Java-level deadlock:
 =
 Timer-1:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Dispatcher thread: Central:
   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
   which is held by DelayedContainerManager
 DelayedContainerManager:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Java stack information for the threads listed above:
 ===
 Timer-1:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
   - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster)
   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 Dispatcher thread: Central:
   at 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
   - waiting to lock 0x0007cd5ab958 (a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
   at 
 org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
   at 
 org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   - locked 0x0007cd1d0208 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
   at java.lang.Thread.run(Thread.java:745)
 DelayedContainerManager:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
   at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
   at 
 

[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-23 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Priority: Critical  (was: Major)

 Deadlock in DAGAppMaster
 

 Key: TEZ-2359
 URL: https://issues.apache.org/jira/browse/TEZ-2359
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Priority: Critical

 {code}
 Found one Java-level deadlock:
 =
 Timer-1:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Dispatcher thread: Central:
   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
   which is held by DelayedContainerManager
 DelayedContainerManager:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Java stack information for the threads listed above:
 ===
 Timer-1:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
   - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster)
   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 Dispatcher thread: Central:
   at 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
   - waiting to lock 0x0007cd5ab958 (a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
   at 
 org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
   at 
 org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   - locked 0x0007cd1d0208 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
   at java.lang.Thread.run(Thread.java:745)
 DelayedContainerManager:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
   at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
   at