[
https://issues.apache.org/jira/browse/YARN-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245032#comment-15245032
]
Jun Gong commented on YARN-4725:
--------------------------------
[~kasha] Thanks for the the comment. As [~vvasudev] said in
https://issues.apache.org/jira/browse/YARN-3998?focusedCommentId=15239448&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15239448,
we will address it in follow up JIRAs.
> [Umbrella] Auto-restart of containers
> --------------------------------------
>
> Key: YARN-4725
> URL: https://issues.apache.org/jira/browse/YARN-4725
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
>
> See overview doc at YARN-4692, copying the sub-section to track all related
> efforts.
> Today, when a container (process-tree) dies, NodeManager assumes that the
> container’s allocation is also expired, and reports accordingly to the
> ResourceManager which then releases the allocation. For service containers,
> this is undesirable in many cases. Long running containers may exit for
> various reasons, crash and need to restart but forcing them to go through the
> complete scheduling cycle, resource localization etc is both unnecessary and
> expensive. (Task) For services it will be good to have NodeManagers
> automatically restart containers. This looks a lot like inittab /
> daemontools at the system level.
> We will need to enable app-specific policies (very similar to the handling
> of AM restarts at YARN level) for restarting containers automatically but
> limit such restarts if a container dies too often in a short interval of time.
> YARN-3998 is an existing ticket that looks at some if not all of this
> functionality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)