[
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Kambatla updated YARN-2010:
-----------------------------------
Attachment: yarn-2010-7.patch
Thanks for the review, Jian. Sorry for the delay in addressing the comments.
Here is patch that moves the credential parsing to RMAppRecoveredTransition
itself, it does make the code much cleaner.
> If RM fails to recover an app, it can never transition to active again
> ----------------------------------------------------------------------
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: bc Wong
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch,
> issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch,
> yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch,
> yarn-2010-7.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of
> turning security on, token expiry, or issues connecting to HDFS etc. The
> causes could be classified into (1) transient, (2) specific to one
> application, and (3) permanent and apply to multiple (all) applications.
> Today, the RM fails to transition to Active and ends up in STOPPED state and
> can never be transitioned to Active again.
> The initial stacktrace reported is at
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)