[
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816939#comment-15816939
]
Karthik Kambatla commented on YARN-6061:
----------------------------------------
[~yufeigu] - thanks for working on this. I must have misunderstood you.
I am in favor of creating a RM-wide UncaughtExceptionHandler, that creates and
sends an RMFatalEvent so the RM can either shutdown or transition-to-standby
based on whether HA is enabled. This allows the StandbyRM to become Active and
run so long as that also doesn't run into the same uncaught exception.
Thinking more about this, on receiving a fatal event, the RM should also
consult {{yarn.resourcemanager.failfast}} to decide whether to shutdown or
transition to standby. That is likely another JIRA though.
> Add a customized uncaughtexceptionhandler for critical threads
> --------------------------------------------------------------
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: yarn
> Reporter: Yufei Gu
> Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there
> is a runtime exception inside it. We should bring down the RM when that
> happens. Otherwise, there may be some weird behavior in RM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]