[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043029#comment-14043029
 ] 

Bikas Saha commented on YARN-614:
---------------------------------

Steve, what you want should already happen. AM will suicide and the RM will 
restart it and count it as an AM failure, just like it does today. This jira is 
just making other source of failure like node loss, not count as an AM failure.

> Retry attempts automatically for hardware failures or YARN issues and set 
> default app retries to 1
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-614
>                 URL: https://issues.apache.org/jira/browse/YARN-614
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>             Fix For: 2.5.0
>
>         Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
> YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
> YARN-614.7.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to