[ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239107#comment-14239107
 ] 

Wangda Tan commented on YARN-2917:
----------------------------------

[~rohithsharma],
Good catch! Thanks for thinking about this. 

My take is this will happen when:
Step 1 : Thread #1 (event dispatcher thread) has some exception when 
dispatching, will call System.exit
Step 2 : Thread #2 (RM main thread) registered ShutdownHook, and will finally 
call AsyncDispatcher.serviceExit
Step 3 : Thread #1 Is waiting for System.exit(-1) returns and Thread #2 is 
waiting for thread #1 exit at the same time. It's a pair of deadlock.

But my question is: is it correct to set drainEventsOnStop to be false when 
such fatal error happens? Shouldn't we wait for it to be drained even if fatal 
error happens?
Any thoughts?

> Potential deadlock in AsyncDispatcher when system.exit called in 
> AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2917
>                 URL: https://issues.apache.org/jira/browse/YARN-2917
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith
>            Assignee: Rohith
>            Priority: Critical
>         Attachments: 0001-YARN-2917.patch
>
>
> I encoutered scenario where RM hanged while shutting down and keep on logging 
> {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to