[
https://issues.apache.org/jira/browse/YARN-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501174#comment-13501174
]
Bikas Saha commented on YARN-218:
---------------------------------
Another thing to fix would be checking for maxRetries before creating a new
attempt. e.g. say maxRetries = 2 and the second retry fails. Just after storing
the failure of that attempt the RM dies. Now when the RM restarts and
re-populates the app status it should not create another attempt for that app.
This is because maxRetries is checked after attempt completion but not before
attempt creation.
> Distiguish between "failed" and "killed" app attempts
> -----------------------------------------------------
>
> Key: YARN-218
> URL: https://issues.apache.org/jira/browse/YARN-218
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: resourcemanager
> Reporter: Tom White
> Assignee: Tom White
>
> A "failed" app attempt is one that failed due to an error in the user
> program, as opposed to one that was "killed" by the system. Like in MapReduce
> task attempts, we should distinguish the two so that killed attempts do not
> count against the number of retries (yarn.resourcemanager.am.max-retries).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira