[
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166567#comment-15166567
]
Jun Gong commented on YARN-3998:
--------------------------------
Thanks [~vinodkv] for explaining it.
{quote}
My point was mainly about creating and reusing a common policy-framework even
if the actual policies may not be entirely reused. We should seriously consider
this instead of creating adhoc APIs for custom hard-coded policies.
{quote}
Yes, it will be better if we could reuse a common policy-framework, we might
need discuss it more.
{quote}
I'm okay creating separate JIRAs under YARN-3998 if you both think of doing so,
but treat (some of the above) as blockers for releasing this feature. Given
that, does it make sense to work on this in a branch?
{quote}
I could address these block problems in this issue if needed. [~vvasudev]
Could you share your thought please? Thanks.
> Add retry-times to let NM re-launch container when it fails to run
> ------------------------------------------------------------------
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Jun Gong
> Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch,
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM
> launches containers, it could specify the value. Then NM will re-launch the
> container 'retry-times' times when it fails to run(e.g.exit code is not 0).
> It will save a lot of time. It avoids container localization. RM does not
> need to re-schedule the container. And local files in container's working
> directory will be left for re-use.(If container have downloaded some big
> files, it does not need to re-download them when running again.)
> We find it is useful in systems like Storm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)