[ 
https://issues.apache.org/jira/browse/YARN-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919654#comment-16919654
 ] 

Tao Yang commented on YARN-9798:
--------------------------------

Thanks [~abmodi] for the review. 
The frequency is only 1 or 2 failures in 2000 runs, and it didn't happen again 
after this fix.

> ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails 
> intermittently
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-9798
>                 URL: https://issues.apache.org/jira/browse/YARN-9798
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Minor
>         Attachments: YARN-9798.001.patch
>
>
> Found intermittent failure of 
> ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster in 
> YARN-9714 jenkins report, the cause is that the assertion which will make 
> sure dispatcher has handled UNREGISTERED event but not wait until all events 
> in dispatcher are handled, we need to add {{rm.drainEvents()}} before that 
> assertion to fix this issue.
> Failure info:
> {noformat}
> [ERROR] 
> testRepeatedFinishApplicationMaster(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity)
>   Time elapsed: 0.559 s  <<< FAILURE!
> java.lang.AssertionError: Expecting only one event expected:<1> but was:<0>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase.testRepeatedFinishApplicationMaster(ApplicationMasterServiceTestBase.java:385)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Standard output:
> {noformat}
> 2019-08-29 06:59:54,458 ERROR [AsyncDispatcher event handler] 
> resourcemanager.ResourceManager (ResourceManager.java:handle(1088)) - Error 
> in handling event type REGISTERED for applicationAttempt 
> appattempt_1567061994047_0001_000001
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:276)
>       at 
> org.apache.hadoop.yarn.event.DrainDispatcher$2.handle(DrainDispatcher.java:91)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1679)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1658)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:914)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1086)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1067)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase$CountingDispatcher.dispatch(ApplicationMasterServiceTestBase.java:401)
>       at 
> org.apache.hadoop.yarn.event.DrainDispatcher$1.run(DrainDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
>       at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
>       at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:268)
>       ... 15 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to