[
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192685#comment-14192685
]
Jian He commented on YARN-2010:
-------------------------------
Hi Karithik, thanks for updating. a couple more things:
- Inside the catch, we may just return FAILED?
{code}
} catch (Exception e) {
LOG.warn("Unable to parse and renew delegation tokens.", e);
throw new YarnRuntimeException(e);
}
{code}
- I don’t think we can get ConnectException here, could you explain under what
scenario, we get ConnectException
{code}
} catch (ConnectException ce) {
// Unable to connect to HDFS or ZK. Assuming this is a transient
// issue, we should gracefully shutdown or transition to standby. If
// the issue is permanent, there is not much YARN can do.
rmContext.getDispatcher().getEventHandler().handle(
new RMFatalEvent(RMFatalEventType.CONNECTION_FAILED, ce));
{code}
> If RM fails to recover an app, it can never transition to active again
> ----------------------------------------------------------------------
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: bc Wong
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch,
> issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch,
> yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch,
> yarn-2010-7.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of
> turning security on, token expiry, or issues connecting to HDFS etc. The
> causes could be classified into (1) transient, (2) specific to one
> application, and (3) permanent and apply to multiple (all) applications.
> Today, the RM fails to transition to Active and ends up in STOPPED state and
> can never be transitioned to Active again.
> The initial stacktrace reported is at
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)