Prashant Golash created YARN-9656:
-------------------------------------

             Summary: Plugin to avoid scheduling jobs on node which are not in 
"schedulable" state, but are healthy otherwise.
                 Key: YARN-9656
                 URL: https://issues.apache.org/jira/browse/YARN-9656
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager, resourcemanager
    Affects Versions: 3.1.2
            Reporter: Prashant Golash


Creating this Jira to get idea from the community if this is something helpful 
which can be done in YARN. Some times the nodes go in a bad state for e.g. (H/W 
problem: I/O is bad; Fan problem). In some other scenarios, if CGroup is not 
enabled, nodes may be running very high on CPU and the jobs scheduled on them 
will suffer.

 

The idea is three-fold:
 # Gather relevant metrics from node-managers and put in some form (for e.g. 
exclude file).
 # RM loads the files and put the nodes as part of the blacklist.
 # Once the node becomes good, they can again be put in the whitelist.

Various optimizations can be done here, but I would like to understand if this 
is something which could be helpful as an upstream feature in YARN.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to