Steve Loughran commented on YARN-2005:

Sunil: we use the scoring to decide whether to trust nodes for specific 
components (e.g. region servers); we can't do anything in the AM for AM 

Like you propose, you can do differentiate some node related as well as 
node-unrelated problems, though a generic System.exit(-1) has to be treated as 
a "both" failure. That is, unless the AM could exit with a specific error code 
'this node doesn't suit us', which could be used to bail out on some problem 
like missing keytab, port in use, no GPU, ...

> Blacklisting support for scheduling AMs
> ---------------------------------------
>                 Key: YARN-2005
>                 URL: https://issues.apache.org/jira/browse/YARN-2005
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 0.23.10, 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Anubhav Dhoot
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.

This message was sent by Atlassian JIRA

Reply via email to