[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802076#comment-13802076
 ] 

Vinod Kumar Vavilapalli commented on YARN-261:
----------------------------------------------

Sorry, wasn't watching this. At YARN-891, we are doing a bunch of changes to 
the state machines w.r.t RM restart. And we need to look at this JIRA also in 
the light of saving all state possible to the state-store to work beyond RM 
restarts. Luckily most of that work is being handled at YARN-891, so that 
should lessen the burden for this JIRA.

I quickly skimmed through this patch - there are two parts to it - Client 
facing changes and the state-machine changes. Given the surgery that's 
happening at YARN-891 w.r.t the state-machines, may I request this patch to be 
blocked on YARN-891. We are moving fast ahead on that JIRA and looking for its 
commit in a couple of days. Thanks.

> Ability to kill AM attempts
> ---------------------------
>
>                 Key: YARN-261
>                 URL: https://issues.apache.org/jira/browse/YARN-261
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: api
>    Affects Versions: 2.0.3-alpha
>            Reporter: Jason Lowe
>            Assignee: Andrey Klochkov
>         Attachments: YARN-261--n2.patch, YARN-261--n3.patch, 
> YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to