[ 
https://issues.apache.org/jira/browse/YARN-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501174#comment-13501174
 ] 

Bikas Saha commented on YARN-218:
---------------------------------

Another thing to fix would be checking for maxRetries before creating a new 
attempt. e.g. say maxRetries = 2 and the second retry fails. Just after storing 
the failure of that attempt the RM dies. Now when the RM restarts and 
re-populates the app status it should not create another attempt for that app. 
This is because maxRetries is checked after attempt completion but not before 
attempt creation.
                
> Distiguish between "failed" and "killed" app attempts
> -----------------------------------------------------
>
>                 Key: YARN-218
>                 URL: https://issues.apache.org/jira/browse/YARN-218
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Tom White
>            Assignee: Tom White
>
> A "failed" app attempt is one that failed due to an error in the user 
> program, as opposed to one that was "killed" by the system. Like in MapReduce 
> task attempts, we should distinguish the two so that killed attempts do not 
> count against the number of retries (yarn.resourcemanager.am.max-retries).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to