[
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802009#comment-13802009
]
Jason Lowe commented on YARN-261:
---------------------------------
Thanks, Andrey. Comments on the latest patch:
ApplicationClientProtocol:
* javadoc for failedApplicationAttempt refers to the request being rejected if
recovery is not supported or maximum attempts reached which is no longer the
case
RMAppAttemptEvent:
* There are a lot of subtypes of this event that do not have diagnostics, so
I'm not sure putting them here is appropriate. I think it would be better to
have an RMAppAttemptFailEvent that corresponds to RMAppAttemptEventType.FAIL
and contains a diagnostic message, or having a separate subclass of
RMAppAttemptEvent like RMAppAttemptDiagnosticEvent that contains a diagnostic
from which RMAppAttemptFailEvent and RMAppAttemptLaunchFailedEvent would derive.
RMAppAttemptLaunchFailedEvent:
* Normally we prefer to have explicitly-named event classes, so this should not
be removed even if the diagnostics is pushed up into the based class.
> Ability to kill AM attempts
> ---------------------------
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: api
> Affects Versions: 2.0.3-alpha
> Reporter: Jason Lowe
> Assignee: Andrey Klochkov
> Attachments: YARN-261--n2.patch, YARN-261--n3.patch,
> YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed. This
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the
> AM supports recovery, and a particular AM attempt is stuck. Currently if
> this occurs the user's only recourse is to kill the entire application,
> requiring them to resubmit a new application and potentially breaking
> downstream dependent jobs if it's part of a bigger workflow. Killing the
> attempt would allow a new attempt to be started by the RM without killing the
> entire application, and if the AM supports recovery it could potentially save
> a lot of work. It could also be useful in workflow scenarios where the
> failure of the entire application kills the workflow, but the ability to kill
> an attempt can keep the workflow going if the subsequent attempt succeeds.
--
This message was sent by Atlassian JIRA
(v6.1#6144)