[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188681#comment-15188681
 ] 

Jun Gong commented on YARN-3998:
--------------------------------

Yes, it seems we need to deal with platform errors, however it still seems 
useful for user to specify retry policy that retry on what error codes because 
only user knows meaning of every error codes. 

> Add retry-times to let NM re-launch container when it fails to run
> ------------------------------------------------------------------
>
>                 Key: YARN-3998
>                 URL: https://issues.apache.org/jira/browse/YARN-3998
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to