[ 
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293971#comment-14293971
 ] 

Sunil G commented on YARN-2005:
-------------------------------

Hi [~jlowe]

As discussed in YARN-2293 , AMs can be failed in few nodes and such nodes can 
be avoid while launching next attempt or even new AMs. 

Scoring mechanism based on AM container failure will be the key point here. A 
container failure which is non-related to a buggy application can be considered 
as genuine candidates here. 

Scoring mechanism also has to be lenient time. If a node is black listed, and 
if its idle for long time by running only normal containers, such nodes can be 
brought back as normal nodes.

I would like to keep a discussion on same here. Please share your thoughts,. 

> Blacklisting support for scheduling AMs
> ---------------------------------------
>
>                 Key: YARN-2005
>                 URL: https://issues.apache.org/jira/browse/YARN-2005
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 0.23.10, 2.4.0
>            Reporter: Jason Lowe
>
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to