[
https://issues.apache.org/jira/browse/YARN-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Templeton resolved YARN-4401.
------------------------------------
Resolution: Won't Fix
This JIRA is superseded by YARN-6035, YARN-6036, and YARN-6037, which capture
the same idea but more supportably.
> A failed app recovery should not prevent the RM from starting
> -------------------------------------------------------------
>
> Key: YARN-4401
> URL: https://issues.apache.org/jira/browse/YARN-4401
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.7.1
> Reporter: Daniel Templeton
> Assignee: Daniel Templeton
> Priority: Critical
> Attachments: YARN-4401.001.patch
>
>
> There are many different reasons why an app recovery could fail with an
> exception, causing the RM start to be aborted. If that happens the RM will
> fail to start. Presumably, the reason the RM is trying to do a recovery is
> that it's the standby trying to fill in for the active. Failing to come up
> defeats the purpose of the HA configuration. Instead of preventing the RM
> from starting, a failed app recovery should log an error and skip the
> application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]