[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647750#comment-13647750
 ] 

Chris Riccomini commented on YARN-614:
--------------------------------------

Added a new patch. Resolves 1 (switch justFinishedContainers to a map for O(1) 
container status look) and 3 (added a shouldIgnoreFailures method) in my list 
above.

Bikas, I think we should leave recovery for another ticket.

Do you want me to update RMAppManager.recover() to have the same "if 
(app.attempts.size() - app.ignoredFailures >= app.maxAppAttempts)" logic as 
RMAppImpl.AttemptFailedTransition?
                
> Retry attempts automatically for hardware failures or YARN issues and set 
> default app retries to 1
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-614
>                 URL: https://issues.apache.org/jira/browse/YARN-614
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>         Attachments: YARN-614-0.patch, YARN-614-1.patch
>
>
> Attempts can fail due to a large number of user errors and they should not be 
> retried unnecessarily. The only reason YARN should retry an attempt is when 
> the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
> errors are the hardware errors that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to