[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020623#comment-15020623 ] Varun Saxena commented on YARN-3878: [~sjlee0], yes this exists in branch-2.6 too. I will help in rebase. > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, > YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb98000a800
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020632#comment-15020632 ] Varun Saxena commented on YARN-3878: [~sjlee0], I have backported the changes and updated a branch-2.6 patch > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878-branch-2.6.01.patch, YARN-3878.01.patch, > YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, > YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, > YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) >
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019117#comment-15019117 ] Sangjin Lee commented on YARN-3878: --- Does this issue exist in 2.6.x? Should this be backported to branch-2.6? > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, > YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:744) > "main" prio=10 tid=0x7fb98000a800
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637049#comment-14637049 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #261 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/261/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637087#comment-14637087 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2210 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2210/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636673#comment-14636673 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk #994 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/994/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636700#comment-14636700 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #264 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/264/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636742#comment-14636742 ] Varun Saxena commented on YARN-3878: Thanks [~jianhe] for the commit and several others for review AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State:
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636995#comment-14636995 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #253 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/253/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636976#comment-14636976 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2191 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2191/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635892#comment-14635892 ] Anubhav Dhoot commented on YARN-3878: - Agree this is ok to ignore AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635897#comment-14635897 ] Jian He commented on YARN-3878: --- thanks ! committing this. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635927#comment-14635927 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-trunk-Commit #8197 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8197/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. Contributed by Varun Saxena (jianhe: rev 393fe71771e3ac6bc0efe59d9aaf19d3576411b3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632787#comment-14632787 ] Varun Saxena commented on YARN-3878: Thanks [~adhoot] for the review. I agree there will be race. To fix this, we would need to introduce some sort of synchronization in frequently called path of GenericEventHandler#handle. But as the race is very rare, and impacts only one event and there might be other events as well which may not be handled due to stop, I think it would be fine to ignore this. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631816#comment-14631816 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:red}-1{color} | pre-patch | 15m 1s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | javac | 7m 37s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 1m 57s | Tests failed in hadoop-yarn-common. | | | | 38m 34s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.event.TestAsyncDispatcher | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745865/YARN-3878.09_reprorace.pat_h | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 419c51d | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8576/artifact/patchprocess/diffJavacWarnings.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8576/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8576/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8576/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631849#comment-14631849 ] Jian He commented on YARN-3878: --- Anubhav, thanks for reviewing the patch. I think given that we cannot guarantee shutdown will process all events - main dispatcher may also have some events pending which are not drained - in any case we are going to lose those events, to keep it simple, it's ok to not handle this rare condition. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch, YARN-3878.09_reprorace.pat_h The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628530#comment-14628530 ] Anubhav Dhoot commented on YARN-3878: - LGTM. Unrelated to this patch there is an existing tiny race between GenericEventHandler checking for blockNewEvents and setting drained to false. If serviceStop happens in between this, it can set blockNewEvents and with drained still true, it can cause eventHandlingThread to finish before the one last event gets added in the queue. Not sure its worth changing the product code for this since shutdown cannot guarantee all events will be processed. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625953#comment-14625953 ] Varun Saxena commented on YARN-3878: [~jianhe], [~kasha], [~devaraj.k], kindly review. Have retained the drained flag now. Just added a check for resetting drained flag on interrupted exception if queue is empty. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625946#comment-14625946 ] Hadoop QA commented on YARN-3878: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 40m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745193/YARN-3878.09.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a431ed9 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8528/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8528/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8528/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626242#comment-14626242 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk #986 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/986/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626232#comment-14626232 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #256 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/256/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626453#comment-14626453 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2202 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2202/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626493#comment-14626493 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #254 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/254/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626388#comment-14626388 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2183 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2183/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626404#comment-14626404 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #244 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/244/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627238#comment-14627238 ] Jian He commented on YARN-3878: --- latest patch looks good to me, will commit if no comments from others. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch, YARN-3878.09.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625826#comment-14625826 ] Varun Saxena commented on YARN-3878: [~jianhe], will update a patch soon. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624769#comment-14624769 ] Varun Saxena commented on YARN-3878: [~jianhe] / [~kasha], added an addendum patch. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624834#comment-14624834 ] Varun Saxena commented on YARN-3878: Should I reopen the issue ? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625396#comment-14625396 ] Jian He commented on YARN-3878: --- re-opened this, also reverted the previous patch. [~varun_saxena], could you upload a clean delta patch for this ? thanks ! AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000]
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625436#comment-14625436 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8157 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8157/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622126#comment-14622126 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk #982 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/982/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622117#comment-14622117 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #252 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/252/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622285#comment-14622285 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2179 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2179/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622295#comment-14622295 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2198 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2198/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622367#comment-14622367 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #240 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/240/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622430#comment-14622430 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #250 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/250/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620836#comment-14620836 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-trunk-Commit #8140 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8140/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621786#comment-14621786 ] Varun Saxena commented on YARN-3878: Thanks [~kasha] for the commit and review. Thanks to [~jianhe] and [~devaraj.k] for the reviews as well. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618738#comment-14618738 ] Varun Saxena commented on YARN-3878: Yeah but we still need to check the thread state. Then let me add something in DrainDispatcher(subclass of AsyncDispatcher) to return thread state and wait on it. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State:
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619478#comment-14619478 ] Karthik Kambatla commented on YARN-3878: +1, pending Jenkins. Will go ahead and commit this once Jenkins is okay. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618101#comment-14618101 ] Karthik Kambatla commented on YARN-3878: bq. If we want have a more reliable way to know that whether thread is started or not, we can have a protected and test scoped method which just returns the eventHandlingThread state(or tells whether thread is started or not) and then we can wait in the test till the eventHandlingThread has started. Like this approach, but I am not against the sleep approach. With a sleep approach, my concern is with a relying on a single sleep. Instead, we could do the following: {code} for (int i = 0; i numRetries; i++) { // check Thread.sleep(100); } // check {code} Given all this, Devaraj's suggestion of adding a protected, test-scoped method is probably equally simple :) AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618067#comment-14618067 ] Varun Saxena commented on YARN-3878: Ok. Will update a patch with 2 seconds sleep by today evening AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619551#comment-14619551 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 33s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | | | 39m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744343/YARN-3878.08.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2e3d83f | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8462/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8462/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8462/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616315#comment-14616315 ] Tsuyoshi Ozawa commented on YARN-3878: -- FYI, I found a news which has impacts against this JIRA, so sharing it: http://www.infoq.com/news/2015/05/redhat-futex Quoting from the URL: {quote} “The impact of this kernel bug is very simple: user processes can deadlock and hang in seemingly impossible situations. A futex wait call (and anything using a futex wait) can stay blocked forever, even though it had been properly woken up by someone. Thread.park() in Java may stay parked. Etc. If you are lucky you may also find soft lockup messages in your dmesg logs. If you are not that lucky (like us, for example), you'll spend a couple of months of someone's time trying to find the fault in your code, when there is nothing there to find.” RHEL 6 (and CentOS 6, and SL 6): 6.0-6.5 are good. 6.6 is BAD. 6.6.z is good. RHEL 7 (and CentOS 7, and SL 7): 7.1 is BAD. As of yesterday. there does not yet appear to be a 7.x fix. \[May 13, 2015\] RHEL 5 (and CentOS 5, and SL 5): All versions are good (including 5.11). {quote} AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617097#comment-14617097 ] Varun Saxena commented on YARN-3878: [~jianhe] / [~devaraj.k] / [~kasha], Checking for service state won't work as AsyncDispatcher's service state will become STARTED even before event handler thread actually starts running. So should I go back to putting hardcoded sleep in test ? I think 2 seconds should be more than enough. Other solutions would require some sort of code to be added in AsyncDispatcher. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617093#comment-14617093 ] Varun Saxena commented on YARN-3878: Thanks a lot for sharing the info [~ozawa]. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x000700b79430 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617199#comment-14617199 ] Varun Saxena commented on YARN-3878: [~jianhe], we have a {{DrainDispatcher}} class which is used only for test. I think I add it there then. But I would still need to change the visibility of eventHandlingThread variable (in AsyncDispatcher) to protected to check whether this thread has started or not. We can wait there(by calling Thread#yield}} till event handling thread has started. Thoughts ? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617200#comment-14617200 ] Varun Saxena commented on YARN-3878: Or have a protected function returning reference to event handling thread AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) -
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617174#comment-14617174 ] Jian He commented on YARN-3878: --- You may try creating a new test sub class which extends AsyncDispatcher class and overrides the serviceStart method. Test can wait till the serviceStart method to finish. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State:
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617681#comment-14617681 ] Jian He commented on YARN-3878: --- bq. Or have a protected function returning reference to event handling thread Sounds ok to me. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617683#comment-14617683 ] Jian He commented on YARN-3878: --- Given dispatcher is a sensitive piece of code, waiting 2-3 seconds in test is ok to me too. I don't have strong preference. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617944#comment-14617944 ] Devaraj K commented on YARN-3878: - bq. Given dispatcher is a sensitive piece of code, waiting 2-3 seconds in test is ok to me too. It is ok for me too. If we want have a more reliable way to know that whether thread is started or not, we can have a protected and test scoped method which just returns the eventHandlingThread state(or tells whether thread is started or not) and then we can wait in the test till the eventHandlingThread has started. I think this may not be required for dispatcher eventHandlingThread and waiting for 2-3 seconds would be ok. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615714#comment-14615714 ] Jian He commented on YARN-3878: --- bq. I don't convince that it is a right way to modify the source code and add a public method for test purpose. agree. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614375#comment-14614375 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 35s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | | | 40m 30s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-common | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743636/YARN-3878.05.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 688617d | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8428/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8428/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8428/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8428/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614398#comment-14614398 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 20s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 9m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 34s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 43s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 49s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 49s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 57s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 15s | Tests passed in hadoop-yarn-common. | | | | 49m 37s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743638/YARN-3878.06.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 688617d | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8429/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8429/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8429/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614355#comment-14614355 ] Varun Saxena commented on YARN-3878: Thanks for the review [~kasha] Will add a method {{hasNoPendingEvents}} to check if any events exist in the queue. Because as per current code only this is required. And that will address both yours and [~devaraj.k]'s concern. Will handle the other comment as well AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait()
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614431#comment-14614431 ] Karthik Kambatla commented on YARN-3878: {{hasNoPendingEvents}} seems a little backwards. Can we do {{hasPendingEvents}} instead and check for the negation of it? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614538#comment-14614538 ] Devaraj K commented on YARN-3878: - Adding to [~kasha] comment, 1. Can you add @VisibleForTesting annotation to this method? {code:xml} + protected boolean hasPendingEvents() { +return !eventQueue.isEmpty(); + } + {code} 2. I don't convince that it is a right way to modify the source code and add a public method for test purpose. For waiting to start the AsyncDispatcher, can't we do like below similar to the way we do in other tests for waiting to change the service state? {code:xml} int waitCount = 0; while (disp.getServiceState() != STATE.STARTED waitCount++ 60) { Thread.sleep(1500); } {code} AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614560#comment-14614560 ] Hadoop QA commented on YARN-3878: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 9m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 2s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 51s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 43s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 36s | Tests passed in hadoop-yarn-common. | | | | 48m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743662/YARN-3878.07.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 688617d | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8432/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8432/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8432/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613661#comment-14613661 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 53s | The applied patch generated 2 new checkstyle issues (total was 6, now 8). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 34s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | | | 40m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743549/YARN-3878.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2eae130 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8424/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8424/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8424/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8424/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613933#comment-14613933 ] Devaraj K commented on YARN-3878: - Thanks [~varun_saxena] for the updated patch. {code:xml} - private final BlockingQueueEvent eventQueue; + protected final BlockingQueueEvent eventQueue; {code} In my previous comment, I meant not to increase the eventQueue visibility to protected for accessing in the tests, I was thinking to check the eventQueue in DrainDispatcher in the same way how you are checking in TestAsyncDispatcher.testDispatcherOnCloseIfQueueEmpty(). I realize that it is not possible to check the due to explicit parameterized constructor{{ this(new LinkedBlockingQueueEvent());}} in DrainDispatcher, I am sorry for not checking this previously. I think you can keep the AsyncDispatcher.isDrained() to access in DrainDispatcher with the same name or better name. It should the fix the induced checkstyle as well. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614097#comment-14614097 ] Hadoop QA commented on YARN-3878: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | | | 40m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743604/YARN-3878.04.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2eae130 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8426/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8426/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8426/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614100#comment-14614100 ] Karthik Kambatla commented on YARN-3878: Thanks for reporting and working on this issue, [~varun_saxena]. The latest patch looks mostly good, but for a few nits: # I am not sure exposing the event queue is warranted here. Exposing {{numPendingEvents}} or leaving {{isDrained}} as is or with another name seems better. # The test uses a sleep to wait for the dispatcher to start. The sleep could be too small on some of our Jenkins machines. We could sleep for short durations in a loop, but that is not particularly clean either. How about blocking on {{start()}} until it actually starts, or adding an {{awaitStart}} method that uses some form of wait-notify? If the latter is too complicated, I am okay with punting it to another JIRA. Otherwise, it looks good to me. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613959#comment-14613959 ] Varun Saxena commented on YARN-3878: Yeah had seen that in checkstyle report. Will fix both checkstyle issues and post a patch. Will have a getter function for event queue instead of isDrained AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) -
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613009#comment-14613009 ] Varun Saxena commented on YARN-3878: Thanks for the review [~devaraj.k] bq. I think AsyncDispatcher.isDrained() method can be removed now from AsyncDispatcher and eventQueue.isEmpty() can verified directly in the tests. Hmm. As queue is a private variable, I think we need some method to return its status. Maybe isDrained isnt semantically correct. I can change the method name. Although as currently isdrained is not checked anywhere except in {{DrainDispatcher#await}}, we can easily remove it. And if somebody else requires, they can add it later. Thoughts ? bq. In TestAsyncDispatcher, can you remove this jira number comment and add the comment about what test does? Ok. bq. In TestAsyncDispatcher, Please use disp.close() instead of disp.stop(). In this case, {{close()}} directly calls {{stop()}} so both of them are in essence same same. Any reason you prefer close over stop ? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613049#comment-14613049 ] Devaraj K commented on YARN-3878: - bq. Although as currently isdrained is not checked anywhere except in DrainDispatcher#await, we can easily remove it. And if somebody else requires, they can add it later. Thoughts ? eventQueue is getting passed from DrainDispatcher constructers and the same can be used in DrainDispatcher.await(). If anybody else wants it in future then they can add the method based on the scenario. I don't see any real reason for keeping this method. bq. In this case, close() directly calls stop() so both of them are in essence same same. Any reason you prefer close over stop ? I agree both do the same but we can avoid the javac warning for disp i.e {{Resource leak: 'disp' is never closed}}. And also there are some rawtypes warnings showing up in the test, you can probably suppress them as well. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613055#comment-14613055 ] Varun Saxena commented on YARN-3878: bq. I agree both do the same but we can avoid the javac warning for disp i.e Resource leak: 'disp' is never closed. Ok...Will make the change. Will go ahead and remove the isDrained function as well. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612511#comment-14612511 ] Jian He commented on YARN-3878: --- ah, sorry, I overlooked. lgtm, thanks ! AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x000700b79430 (a java.lang.Object) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612729#comment-14612729 ] Devaraj K commented on YARN-3878: - Thanks [~varun_saxena] for the patch and [~jianhe] for the review. There are few minor comments on the patch, you can address these before getting this patch in. * I think AsyncDispatcher.isDrained() method can be removed now from AsyncDispatcher and eventQueue.isEmpty() can verified directly in the tests. * In TestAsyncDispatcher, can you remove this jira number comment and add the comment about what test does? {code:xml} /* Test to verify fix for YARN-3878 */ {code} * In TestAsyncDispatcher, Please use *disp.close()* instead of disp.stop(). AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612320#comment-14612320 ] Jian He commented on YARN-3878: --- Hi [~varun_saxena], the test seems not adequate. It doesn't prove the AsyncDispatcher will hang in this case. Could you update the test case to simulate this scenario and will actually hang without the core changes of the patch ? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612342#comment-14612342 ] Varun Saxena commented on YARN-3878: [~jianhe], the test case as such is adequate. I have basically added two assert statements. If first statement is true, and second is false, hang will occur. But as assert statements are there, test would fail before hang occurs. Without core changes of patch, test will fail at second assertion point. But if you remove this second assertion point, hang will occur and test case time out. {code} Assert.assertTrue(Event Queue should have been empty, eventQueue.isEmpty()); Assert.assertTrue(Async Dispatcher should have been drained as event + queue is empty, disp.isDrained()); {code} So do you want me to remove this second assertion statement so that test case doesnt fail before hang ? (without core changes). Let me know. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch, YARN-3878.02.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611484#comment-14611484 ] Jian He commented on YARN-3878: --- [~varun_saxena], thanks for reporting this. Mind adding a test case to verify this ? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x000700b79430 (a java.lang.Object) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611376#comment-14611376 ] Hadoop QA commented on YARN-3878: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 34s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 4s | Tests passed in hadoop-yarn-common. | | | | 40m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743200/YARN-3878.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a78d507 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8418/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8418/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8418/console | This message was automatically generated. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3878.01.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611249#comment-14611249 ] Varun Saxena commented on YARN-3878: According to me, {{drained}} flag is unnecessary. We can directly use {{LinkedBlockingQueue#isEmpty()}} instead. It is a thread safe call as it uses {{AtomicInteger}} internally and accurately determine if the queue has been drained or not as well AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611245#comment-14611245 ] Varun Saxena commented on YARN-3878: The reason for this issue is as under : * On call to {{GenericEventHandler#handle}} we make the flag drained as false even before putting the event in queue. If {{InterruptedException}} occurs, this flag is never reset. {code} public void handle(Event event) { if (blockNewEvents) { return; } drained = false; ... try { eventQueue.put(event); } catch (InterruptedException e) { if (!stopped) { LOG.warn(AsyncDispatcher thread interrupted, e); } throw new YarnRuntimeException(e); } {code} * In {{AsyncDispatcher#serviceStop}}, code is as under. As can be seen we wait for drained flag to become true {code} protected void serviceStop() throws Exception { if (drainEventsOnStop) { blockNewEvents = true; LOG.info(AsyncDispatcher is draining to stop, igonring any new events.); synchronized (waitForDrained) { while (!drained eventHandlingThread.isAlive()) { waitForDrained.wait(1000); LOG.info(Waiting for AsyncDispatcher to drain. Thread state is : + eventHandlingThread.getState()); } } } {code} * In event queue thread's run method though, we make a call to {{LinkedBlockingQueue#take}} which is a blocking call till it finds elements in the queue. So if queue is already empty and as nothing is posting elements to this queue, the drained flag would never become true. {code} Runnable createThread() { return new Runnable() { @Override public void run() { while (!stopped !Thread.currentThread().isInterrupted()) { drained = eventQueue.isEmpty(); .. try { event = eventQueue.take(); } catch(InterruptedException ie) { } if (event != null) { dispatch(event); } {code} AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at