Wangda Tan commented on YARN-2917:

Good catch! Thanks for thinking about this. 

My take is this will happen when:
Step 1 : Thread #1 (event dispatcher thread) has some exception when 
dispatching, will call System.exit
Step 2 : Thread #2 (RM main thread) registered ShutdownHook, and will finally 
call AsyncDispatcher.serviceExit
Step 3 : Thread #1 Is waiting for System.exit(-1) returns and Thread #2 is 
waiting for thread #1 exit at the same time. It's a pair of deadlock.

But my question is: is it correct to set drainEventsOnStop to be false when 
such fatal error happens? Shouldn't we wait for it to be drained even if fatal 
error happens?
Any thoughts?

> Potential deadlock in AsyncDispatcher when system.exit called in 
> AsyncDispatcher#dispatch and AsyscDispatcher#serviceStop from shutdown hook
> --------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2917
>                 URL: https://issues.apache.org/jira/browse/YARN-2917
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith
>            Assignee: Rohith
>            Priority: Critical
>         Attachments: 0001-YARN-2917.patch
> I encoutered scenario where RM hanged while shutting down and keep on logging 
> {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Waiting for AsyncDispatcher to drain.}}

This message was sent by Atlassian JIRA

Reply via email to