[ 
https://issues.apache.org/jira/browse/YARN-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4725:
------------------------------------------
    Description: 
See overview doc at YARN-4692, copying the sub-section to track all related 
efforts.

Today, when a container (process­-tree) dies, NodeManager assumes that the 
container’s allocation is also expired, and reports accordingly to the 
ResourceManager which then releases the allocation. For service containers, 
this is undesirable in many cases. Long running containers may exit for various 
reasons, crash and need to restart but forcing them to go through the complete 
scheduling cycle, resource localization etc is both unnecessary and expensive. 
(​Task) ​For services it will be good to have NodeManagers automatically 
restart containers. This looks a lot like inittab / daemon­tools at the system 
level.

We will need to enable app­-specific policies (very similar to the handling of 
AM restarts at YARN level) for restarting containers automatically but limit 
such restarts if a container dies too often in a short interval of time.

YARN-3998 is an existing ticket that looks at some if not all of this 
functionality.

  was:
See overview doc at YARN-4692, copying the sub-section to track all related 
efforts.

Today, when a container (process­-tree) dies, NodeManager assumes that the 
container’s allocation is also expired, and reports accordingly to the 
ResourceManager which then releases the allocation. For service containers, 
this is undesirable in many cases. Long running containers may exit for various 
reasons, crash and need to restart but forcing them to go through the complete 
scheduling cycle, resource localization etc is both unnecessary and expensive. 
(​Task) ​For services it will be good to have NodeManagers automatically 
restart containers. This looks a lot like inittab / daemon­tools at the system 
level.

We will need to enable app­-specific policies (very similar to the handling of 
AM restarts at YARN level) for restarting containers automatically but limit 
such restarts if a container dies too often in a short interval of time.

YARN­-3998 is an existing ticket that looks at some if not all of this 
functionality.


> [Umbrella] Auto-­restart of containers
> --------------------------------------
>
>                 Key: YARN-4725
>                 URL: https://issues.apache.org/jira/browse/YARN-4725
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>
> See overview doc at YARN-4692, copying the sub-section to track all related 
> efforts.
> Today, when a container (process­-tree) dies, NodeManager assumes that the 
> container’s allocation is also expired, and reports accordingly to the 
> ResourceManager which then releases the allocation. For service containers, 
> this is undesirable in many cases. Long running containers may exit for 
> various reasons, crash and need to restart but forcing them to go through the 
> complete scheduling cycle, resource localization etc is both unnecessary and 
> expensive. (​Task) ​For services it will be good to have NodeManagers 
> automatically restart containers. This looks a lot like inittab / 
> daemon­tools at the system level.
> We will need to enable app­-specific policies (very similar to the handling 
> of AM restarts at YARN level) for restarting containers automatically but 
> limit such restarts if a container dies too often in a short interval of time.
> YARN-3998 is an existing ticket that looks at some if not all of this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to