[
https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Golash updated YARN-9656:
----------------------------------
Affects Version/s: 2.9.1
> Plugin to avoid scheduling jobs on node which are not in "schedulable" state,
> but are healthy otherwise.
> --------------------------------------------------------------------------------------------------------
>
> Key: YARN-9656
> URL: https://issues.apache.org/jira/browse/YARN-9656
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager, resourcemanager
> Affects Versions: 2.9.1, 3.1.2
> Reporter: Prashant Golash
> Priority: Major
>
> Creating this Jira to get idea from the community if this is something
> helpful which can be done in YARN. Some times the nodes go in a bad state for
> e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if
> CGroup is not enabled, nodes may be running very high on CPU and the jobs
> scheduled on them will suffer.
>
> The idea is three-fold:
> # Gather relevant metrics from node-managers and put in some form (for e.g.
> exclude file).
> # RM loads the files and put the nodes as part of the blacklist.
> # Once the node becomes good, they can again be put in the whitelist.
> Various optimizations can be done here, but I would like to understand if
> this is something which could be helpful as an upstream feature in YARN.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]