[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3878:
-
Fix Version/s: 2.8.0

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
>
> Attachments: YARN-3878-branch-2.6.01.patch, YARN-3878.01.patch, 
> YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
> YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, 
> YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
> 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-11-23 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3878:
--
Fix Version/s: 2.6.3

+1. Committed it to branch-2.6. Thanks [~varun_saxena]!

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-3878-branch-2.6.01.patch, YARN-3878.01.patch, 
> YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
> YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, 
> YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-11-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878-branch-2.6.01.patch

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---
>
> Key: YARN-3878
> URL: https://issues.apache.org/jira/browse/YARN-3878
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-3878-branch-2.6.01.patch, YARN-3878.01.patch, 
> YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
> YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, 
> YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>   at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>   at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e 
> waiting on condition [0x7fb9654e9000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
> 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-17 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3878:

Attachment: YARN-3878.09_reprorace.pat_h

Attaching a patch just to demonstrate the race. Since its trying to demonstrate 
the race it injects an artificial delay, hence not making it an official patch. 
Run the test testBlockNewEvents to show that an event can be in the queue while 
serviceStop happens.

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, 
 YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-14 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.09.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: (was: YARN-3878-addendum.patch)

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: (was: YARN-3878-addendum.patch)

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878-addendum.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878-addendum.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, 
 YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, 
 YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.08.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.05.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.06.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.07.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, 
 YARN-3878.06.patch, YARN-3878.07.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-04 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.04.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch, YARN-3878.04.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
   - locked 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-03 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.03.patch

[~devaraj.k], addressed your comments

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, 
 YARN-3878.03.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-02 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.02.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
   - locked 0x000700b79430 (a java.lang.Object)
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-02 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.02.patch

[~jianhe], added a test case

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch, YARN-3878.02.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
   - locked 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-02 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: (was: YARN-3878.02.patch)

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
   - locked 0x000700b79430 (a java.lang.Object)
   at 
 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-01 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Description: 
The sequence of events is as under :
# RM is stopped while putting a RMStateStore Event to RMStateStore's 
AsyncDispatcher. This leads to an Interrupted Exception being thrown.
# As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
{{serviceStop}}, we will check if all events have been drained and wait for 
event queue to drain(as RM State Store dispatcher is configured for queue to 
drain on stop). 
# This condition never becomes true and AsyncDispatcher keeps on waiting 
incessantly for dispatcher event queue to drain till JVM exits.

*Initial exception while posting RM State store event to queue*
{noformat}
2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
(AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
STOPPED
2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
thread interrupted
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
{noformat}

*JStack of AsyncDispatcher hanging on stop*
{noformat}
AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
waiting on condition [0x7fb9654e9000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000700b79250 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
at java.lang.Thread.run(Thread.java:744)

main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
[0x7fb989851000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x000700b79430 (a java.lang.Object)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
- locked 0x000700b79430 (a java.lang.Object)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0x000700b79420 (a java.lang.Object)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0x000700b79630 (a java.lang.Object)
at 
org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-01 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Description: 
The sequence of events is as under :
# RM is stopped while putting a RMStateStore Event to RMStateStore's 
AsyncDispatcher. This leads to an Interrupted Exception being thrown.
# As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
{{serviceStop}}, we will check if all events have been drained and wait for 
event queue to drain(as RM State Store dispatcher is configured for queue to 
drain on stop). 
# This condition never becomes true and AsyncDispatcher keeps on waiting 
incessantly for dispatcher event queue to drain till JVM exits.

*Initial exception while posting RM State store event to queue*
{noformat}
2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
(AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
STOPPED
2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
thread interrupted
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
{noformat}

*JStack of AsyncDispatcher hanging on stop*
{noformat}
AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
waiting on condition [0x7fb9654e9000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000700b79250 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
at java.lang.Thread.run(Thread.java:744)

main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
[0x7fb989851000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x000700b79430 (a java.lang.Object)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
- locked 0x000700b79430 (a java.lang.Object)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0x000700b79420 (a java.lang.Object)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0x000700b79630 (a java.lang.Object)
at 
org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 

[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop

2015-07-01 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3878:
---
Attachment: YARN-3878.01.patch

 AsyncDispatcher can hang while stopping if it is configured for draining 
 events on stop
 ---

 Key: YARN-3878
 URL: https://issues.apache.org/jira/browse/YARN-3878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-3878.01.patch


 The sequence of events is as under :
 # RM is stopped while putting a RMStateStore Event to RMStateStore's 
 AsyncDispatcher. This leads to an Interrupted Exception being thrown.
 # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
 {{serviceStop}}, we will check if all events have been drained and wait for 
 event queue to drain(as RM State Store dispatcher is configured for queue to 
 drain on stop). 
 # This condition never becomes true and AsyncDispatcher keeps on waiting 
 incessantly for dispatcher event queue to drain till JVM exits.
 *Initial exception while posting RM State store event to queue*
 {noformat}
 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
 (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
 STOPPED
 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
 {noformat}
 *JStack of AsyncDispatcher hanging on stop*
 {noformat}
 AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e 
 waiting on condition [0x7fb9654e9000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000700b79250 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:744)
 main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() 
 [0x7fb989851000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x000700b79430 (a java.lang.Object)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
   - locked 0x000700b79430 (a java.lang.Object)
   at