[ 
https://issues.apache.org/jira/browse/YARN-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245032#comment-15245032
 ] 

Jun Gong commented on YARN-4725:
--------------------------------

[~kasha] Thanks for the the comment. As [~vvasudev] said in 
https://issues.apache.org/jira/browse/YARN-3998?focusedCommentId=15239448&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15239448,
 we will address it in follow up JIRAs.

> [Umbrella] Auto-­restart of containers
> --------------------------------------
>
>                 Key: YARN-4725
>                 URL: https://issues.apache.org/jira/browse/YARN-4725
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>
> See overview doc at YARN-4692, copying the sub-section to track all related 
> efforts.
> Today, when a container (process­-tree) dies, NodeManager assumes that the 
> container’s allocation is also expired, and reports accordingly to the 
> ResourceManager which then releases the allocation. For service containers, 
> this is undesirable in many cases. Long running containers may exit for 
> various reasons, crash and need to restart but forcing them to go through the 
> complete scheduling cycle, resource localization etc is both unnecessary and 
> expensive. (​Task) ​For services it will be good to have NodeManagers 
> automatically restart containers. This looks a lot like inittab / 
> daemon­tools at the system level.
> We will need to enable app­-specific policies (very similar to the handling 
> of AM restarts at YARN level) for restarting containers automatically but 
> limit such restarts if a container dies too often in a short interval of time.
> YARN-3998 is an existing ticket that looks at some if not all of this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to