Prashant Golash created YARN-9656:
-------------------------------------
Summary: Plugin to avoid scheduling jobs on node which are not in
"schedulable" state, but are healthy otherwise.
Key: YARN-9656
URL: https://issues.apache.org/jira/browse/YARN-9656
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager, resourcemanager
Affects Versions: 3.1.2
Reporter: Prashant Golash
Creating this Jira to get idea from the community if this is something helpful
which can be done in YARN. Some times the nodes go in a bad state for e.g. (H/W
problem: I/O is bad; Fan problem). In some other scenarios, if CGroup is not
enabled, nodes may be running very high on CPU and the jobs scheduled on them
will suffer.
The idea is three-fold:
# Gather relevant metrics from node-managers and put in some form (for e.g.
exclude file).
# RM loads the files and put the nodes as part of the blacklist.
# Once the node becomes good, they can again be put in the whitelist.
Various optimizations can be done here, but I would like to understand if this
is something which could be helpful as an upstream feature in YARN.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]