[ 
https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611376#comment-14611376
 ] 

Hadoop QA commented on YARN-3878:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m  4s | Tests passed in 
hadoop-yarn-common. |
| | |  40m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12743200/YARN-3878.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a78d507 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8418/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8418/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8418/console |


This message was automatically generated.

> AsyncDispatcher can hang while stopping if it is configured for draining 
> events on stop
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3878
>                 URL: https://issues.apache.org/jira/browse/YARN-3878
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: YARN-3878.01.patch
>
>
> The sequence of events is as under :
> # RM is stopped while putting a RMStateStore Event to RMStateStore's 
> AsyncDispatcher. This leads to an Interrupted Exception being thrown.
> # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On 
> {{serviceStop}}, we will check if all events have been drained and wait for 
> event queue to drain(as RM State Store dispatcher is configured for queue to 
> drain on stop). 
> # This condition never becomes true and AsyncDispatcher keeps on waiting 
> incessantly for dispatcher event queue to drain till JVM exits.
> *Initial exception while posting RM State store event to queue*
> {noformat}
> 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService 
> (AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
> STOPPED
> 2015-06-27 20:08:35,923 WARN  [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
> thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>       at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>       at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
> {noformat}
> *JStack of AsyncDispatcher hanging on stop*
> {noformat}
> "AsyncDispatcher event handler" prio=10 tid=0x00007fb980222800 nid=0x4b1e 
> waiting on condition [0x00007fb9654e9000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000700b79250> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>         at java.lang.Thread.run(Thread.java:744)
> "main" prio=10 tid=0x00007fb98000a800 nid=0x49c3 in Object.wait() 
> [0x00007fb989851000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x0000000700b79430> (a java.lang.Object)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:156)
>       - locked <0x0000000700b79430> (a java.lang.Object)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       - locked <0x0000000700b79420> (a java.lang.Object)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:515)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       - locked <0x0000000700b79630> (a java.lang.Object)
>       at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599)
> {noformat}
> We keep on getting below logs
> {noformat}
> 2015-06-27 20:08:35,926 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(140)) - AsyncDispatcher is draining to 
> stop, igonring any new events.
> 2015-06-27 20:08:36,926 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:37,927 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:38,927 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:39,928 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:40,929 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:41,929 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:42,930 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:43,930 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:44,931 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:45,931 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> 2015-06-27 20:08:46,932 INFO  [main] event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to 
> drain. Thread state is :WAITING
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to