[ 
https://issues.apache.org/jira/browse/YARN-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3607:
-----------------------------------
    Assignee:     (was: Karthik Kambatla)

> Allow users to choose between failing the daemons vs failing the 
> apps/containers
> --------------------------------------------------------------------------------
>
>                 Key: YARN-3607
>                 URL: https://issues.apache.org/jira/browse/YARN-3607
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, resourcemanager, scheduler
>    Affects Versions: 2.7.0
>            Reporter: Karthik Kambatla
>
> We often run into cases where we are faced with the option of failing the 
> daemon (fail-fast) vs failing user's work and keep the cluster running. There 
> is no clear right way to handle these situations - some users would like to 
> be conservative and let the daemons run, while others would like to 
> fail-fast. 
> Today, we handle these case-by-case and go by what the people working on it 
> feel is the right way to handle things. Examples include how we handle app 
> recovery failures, queue-changes on RM restart. 
> Users should be able to choose between these two extremes, and have all these 
> situations handled the same way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to