Jian He updated YARN-2074:

    Attachment: YARN-2074.8.patch

Thanks Vinod for the review! uploaded a new patch.

bq. Not related to the patch, but I think I found a bug - the following doesn't 
take into whether the finished container is an AM or not. Let's file a ticket..
Checked more. This may be fine because  in the case of work-preserving AM 
restart, the container-finished event will be sent to the previous failed 
attempt which is capturing all the finished containers.
bq. Why are we making this change? Comment in code as well as here as to the 
why. May be add a test too?
Add comment in the code. Test is already added to cover this.
bq.  Need to think about how this will work when clusters get upgraded.
added a test case to check the default container exit status in protobuf is 
indeed -1000.
Fixed other comments also.

> Preemption of AM containers shouldn't count towards AM failures
> ---------------------------------------------------------------
>                 Key: YARN-2074
>                 URL: https://issues.apache.org/jira/browse/YARN-2074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch, 
> YARN-2074.7.patch, YARN-2074.7.patch, YARN-2074.8.patch
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.

This message was sent by Atlassian JIRA

Reply via email to