[
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578716#comment-14578716
]
Steve Loughran commented on YARN-2005:
--------------------------------------
Sunil: we use the scoring to decide whether to trust nodes for specific
components (e.g. region servers); we can't do anything in the AM for AM
failures.
Like you propose, you can do differentiate some node related as well as
node-unrelated problems, though a generic System.exit(-1) has to be treated as
a "both" failure. That is, unless the AM could exit with a specific error code
'this node doesn't suit us', which could be used to bail out on some problem
like missing keytab, port in use, no GPU, ...
> Blacklisting support for scheduling AMs
> ---------------------------------------
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 0.23.10, 2.4.0
> Reporter: Jason Lowe
> Assignee: Anubhav Dhoot
>
> It would be nice if the RM supported blacklisting a node for an AM launch
> after the same node fails a configurable number of AM attempts. This would
> be similar to the blacklisting support for scheduling task attempts in the
> MapReduce AM but for scheduling AM attempts on the RM side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)