Karthik Kambatla created YARN-3607:
--------------------------------------
Summary: Allow users to choose between failing the daemons vs
failing the apps/containers
Key: YARN-3607
URL: https://issues.apache.org/jira/browse/YARN-3607
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
We often run into cases where we are faced with the option of failing the
daemon (fail-fast) vs failing user's work and keep the cluster running. There
is no clear right way to handle these situations - some users would like to be
conservative and let the daemons run, while others would like to fail-fast.
Today, we handle these case-by-case and go by what the people working on it
feel is the right way to handle things. Examples include how we handle app
recovery failures, queue-changes on RM restart.
Users should be able to choose between these two extremes, and have all these
situations handled the same way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)