[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802009#comment-13802009
 ] 

Jason Lowe commented on YARN-261:
---------------------------------

Thanks, Andrey.  Comments on the latest patch:

ApplicationClientProtocol:
* javadoc for failedApplicationAttempt refers to the request being rejected if 
recovery is not supported or maximum attempts reached which is no longer the 
case

RMAppAttemptEvent:
* There are a lot of subtypes of this event that do not have diagnostics, so 
I'm not sure putting them here is appropriate.  I think it would be better to 
have an RMAppAttemptFailEvent that corresponds to RMAppAttemptEventType.FAIL 
and contains a diagnostic message, or having a separate subclass of 
RMAppAttemptEvent like RMAppAttemptDiagnosticEvent that contains a diagnostic 
from which RMAppAttemptFailEvent and RMAppAttemptLaunchFailedEvent would derive.

RMAppAttemptLaunchFailedEvent:
* Normally we prefer to have explicitly-named event classes, so this should not 
be removed even if the diagnostics is pushed up into the based class.


> Ability to kill AM attempts
> ---------------------------
>
>                 Key: YARN-261
>                 URL: https://issues.apache.org/jira/browse/YARN-261
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: api
>    Affects Versions: 2.0.3-alpha
>            Reporter: Jason Lowe
>            Assignee: Andrey Klochkov
>         Attachments: YARN-261--n2.patch, YARN-261--n3.patch, 
> YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to