[
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802076#comment-13802076
]
Vinod Kumar Vavilapalli commented on YARN-261:
----------------------------------------------
Sorry, wasn't watching this. At YARN-891, we are doing a bunch of changes to
the state machines w.r.t RM restart. And we need to look at this JIRA also in
the light of saving all state possible to the state-store to work beyond RM
restarts. Luckily most of that work is being handled at YARN-891, so that
should lessen the burden for this JIRA.
I quickly skimmed through this patch - there are two parts to it - Client
facing changes and the state-machine changes. Given the surgery that's
happening at YARN-891 w.r.t the state-machines, may I request this patch to be
blocked on YARN-891. We are moving fast ahead on that JIRA and looking for its
commit in a couple of days. Thanks.
> Ability to kill AM attempts
> ---------------------------
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: api
> Affects Versions: 2.0.3-alpha
> Reporter: Jason Lowe
> Assignee: Andrey Klochkov
> Attachments: YARN-261--n2.patch, YARN-261--n3.patch,
> YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed. This
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the
> AM supports recovery, and a particular AM attempt is stuck. Currently if
> this occurs the user's only recourse is to kill the entire application,
> requiring them to resubmit a new application and potentially breaking
> downstream dependent jobs if it's part of a bigger workflow. Killing the
> attempt would allow a new attempt to be started by the RM without killing the
> entire application, and if the AM supports recovery it could potentially save
> a lot of work. It could also be useful in workflow scenarios where the
> failure of the entire application kills the workflow, but the ability to kill
> an attempt can keep the workflow going if the subsequent attempt succeeds.
--
This message was sent by Atlassian JIRA
(v6.1#6144)