[ 
https://issues.apache.org/jira/browse/YARN-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260035#comment-14260035
 ] 

Rohith commented on YARN-2991:
------------------------------

In serviceStop() , eventHandlingThread is interrupted and join for thread to 
complete. In test case, DrainDispatcher used which create its own thread. But 
real issue for randomness is when thread.Interupt is called, it is not madatory 
that thread will get interrupt unless thread is blocked.  So there should be 
mechanism to exit thread by setting boolean flag in while loop.
Updated the patch for handling this. I run the test many times, it is able to 
run without getting hang.
Kindly review the patch

> TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on 
> trunk
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-2991
>                 URL: https://issues.apache.org/jira/browse/YARN-2991
>             Project: Hadoop YARN
>          Issue Type: Test
>            Reporter: Zhijie Shen
>            Assignee: Rohith
>            Priority: Blocker
>         Attachments: 0001-YARN-2991.patch
>
>
> {code}
> Error Message
> test timed out after 60000 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 60000 milliseconds
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Thread.join(Thread.java:1281)
>       at java.lang.Thread.join(Thread.java:1355)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>       at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>       at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>       at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
> {code}
> It happened twice this months:
> https://builds.apache.org/job/PreCommit-YARN-Build/6096/
> https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to