[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816939#comment-15816939
 ] 

Karthik Kambatla commented on YARN-6061:
----------------------------------------

[~yufeigu] - thanks for working on this. I must have misunderstood you.

I am in favor of creating a RM-wide UncaughtExceptionHandler, that creates and 
sends an RMFatalEvent so the RM can either shutdown or transition-to-standby 
based on whether HA is enabled. This allows the StandbyRM to become Active and 
run so long as that also doesn't run into the same uncaught exception. 

Thinking more about this, on receiving a fatal event, the RM should also 
consult {{yarn.resourcemanager.failfast}} to decide whether to shutdown or 
transition to standby. That is likely another JIRA though. 

> Add a customized uncaughtexceptionhandler for critical threads
> --------------------------------------------------------------
>
>                 Key: YARN-6061
>                 URL: https://issues.apache.org/jira/browse/YARN-6061
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>         Attachments: YARN-6061.001.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to