Jason Lowe commented on YARN-2005:

My concern with a cluster-wide approach like that proposed in YARN-2293 is the 
ability of one buggy setup from a user/app to spoil nodes for others.  For 
example, if there's a workflow that constantly spams the RM with job 
submissions of AMs that are broken (AM either fails instantly or is able to 
register but then fails), how does the RM/NM distinguish that failure as being 
specific to the node vs. a buggy application?

Even a per-application blacklisting logic could be beneficial to help prevent 
subsequent attempts from the same application launching on the same node.  We 
may consider doing per-application blacklisting logic if that's simpler to 
manage/implement in light of buggy apps on a cluster.

> Blacklisting support for scheduling AMs
> ---------------------------------------
>                 Key: YARN-2005
>                 URL: https://issues.apache.org/jira/browse/YARN-2005
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 0.23.10, 2.4.0
>            Reporter: Jason Lowe
> It would be nice if the RM supported blacklisting a node for an AM launch 
> after the same node fails a configurable number of AM attempts.  This would 
> be similar to the blacklisting support for scheduling task attempts in the 
> MapReduce AM but for scheduling AM attempts on the RM side.

This message was sent by Atlassian JIRA

Reply via email to