[
https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293998#comment-14293998
]
Jason Lowe commented on YARN-2005:
----------------------------------
My concern with a cluster-wide approach like that proposed in YARN-2293 is the
ability of one buggy setup from a user/app to spoil nodes for others. For
example, if there's a workflow that constantly spams the RM with job
submissions of AMs that are broken (AM either fails instantly or is able to
register but then fails), how does the RM/NM distinguish that failure as being
specific to the node vs. a buggy application?
Even a per-application blacklisting logic could be beneficial to help prevent
subsequent attempts from the same application launching on the same node. We
may consider doing per-application blacklisting logic if that's simpler to
manage/implement in light of buggy apps on a cluster.
> Blacklisting support for scheduling AMs
> ---------------------------------------
>
> Key: YARN-2005
> URL: https://issues.apache.org/jira/browse/YARN-2005
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 0.23.10, 2.4.0
> Reporter: Jason Lowe
>
> It would be nice if the RM supported blacklisting a node for an AM launch
> after the same node fails a configurable number of AM attempts. This would
> be similar to the blacklisting support for scheduling task attempts in the
> MapReduce AM but for scheduling AM attempts on the RM side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)