[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.8.patch

Thanks Vinod for the review! uploaded a new patch.

bq. Not related to the patch, but I think I found a bug - the following doesn't 
take into whether the finished container is an AM or not. Let's file a ticket..
Checked more. This may be fine because  in the case of work-preserving AM 
restart, the container-finished event will be sent to the previous failed 
attempt which is capturing all the finished containers.
bq. Why are we making this change? Comment in code as well as here as to the 
why. May be add a test too?
Add comment in the code. Test is already added to cover this.
bq.  Need to think about how this will work when clusters get upgraded.
added a test case to check the default container exit status in protobuf is 
indeed -1000.
Fixed other comments also.


> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch, 
> YARN-2074.7.patch, YARN-2074.7.patch, YARN-2074.8.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.7.patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch, 
> YARN-2074.7.patch, YARN-2074.7.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.7.patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch, 
> YARN-2074.7.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.6.patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch, YARN-2074.6.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.6.patch

thanks for pointing out! fixed it.

One thing to note here is that AM ContainerExitStatus for succeeded app is not 
saved in state store. only containerExitStatus for failed apps is saved.

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch, YARN-2074.6.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.5.patch

Fixed the find bug warnings

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch, YARN-2074.5.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.4.patch

Updated the patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch, 
> YARN-2074.4.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.3.patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.2.patch

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch, YARN-2074.2.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-20 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2074:
--

Attachment: YARN-2074.1.patch

Patch to not account AM preemption as AM failure.
Patch checks the diagnostics of the attempt to determine whether this attempt 
is preempted or not.

There's a race condition related to RM restart which is not addressed in this 
patch. If the attempt is preempted and RM restarts before the attempt state is 
saved in the state store. The new RM won't be able to figure out whether the 
previous attempt is preempted or not.
Fixing this may require the NM-RM protocol change to indicate NM whether the AM 
preempted or killed so that when RM recovers NM can notify RM back whether the 
previous AM container is preempted or not. In addition, RMContainer transition 
may also need to be changed accordingly. we may fix it in separate jira.


> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jian He
> Attachments: YARN-2074.1.patch
>
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2074:
--

Fix Version/s: (was: 2.1.0-beta)

> Preemption of AM containers shouldn't count towards AM failures
> ---
>
> Key: YARN-2074
> URL: https://issues.apache.org/jira/browse/YARN-2074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
> containers getting preempted shouldn't count towards AM failures and thus 
> shouldn't eventually fail applications.
> We should explicitly handle AM container preemption/kill as a separate issue 
> and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)