Craig Welch commented on YARN-1680:

bq. Please leave out the head-room concerns w.r.t node-labels. IIRC, we had 
tickets at YARN-796 tracking that. It is very likely a completely different 
solution, so.

I'm not sure that's so - there is already a process of calculating headroom for 
labels associated with an application, the above is an extension of that to 
blacklisted nodes to handle label cases.  If we leave it out, then the solution 
won't work for node-labels, and it can be made to do so, so that would be a 

bq. When I said node-labels above, I meant partitions. Clearly the problem and 
the corresponding solution will likely be very similar for node-constraints 
(one type of node-labels). After all, blacklisting is a type of (anti) 

It could be modeled that way, but then it will be qualitatively different from 
the solution for non-label cases, which is not a good thing...

bq. There is no notion of a cluster-level blacklisting in YARN. We have notions 
of unhealthy/lost/decommissioned nodes in a cluster. 

This is what I am referring when I say:
bq. addition/removal at the cluster level
I'm not suggesting/referring to anything other than nodes entering/leaving the 

bq.  Coming to the app-level blacklisting, clearly, the solution proposed is 
better than dead-locks. But blindly reducing the resources corresponding to 
blacklisted nodes will result in under-utilization (sometimes massively) and 
over-conservative scheduling requests by apps.

So, that's the point of the recommended approach.  The idea is to detect when 
it is necessary to recalculate the impact of the blacklisting on app headroom, 
which is when either blacklisting from the app has changed or the node 
composition of the cluster has changed (each of which should be relatively 
infrequent, certainly in relation to headroom calculation), and at that time to 
accurately calculate the impact by only adding the resource value nodes which 
actually exist from the blacklist into the value of the deduction.  It isn't 
"blindly reducing resources", it's doing it accurately, and should both prevent 
deadlocks and under-utilization

bq. One way to resolve this is to get the apps (or optionally in the AMRMClient 
library) to deduct the resource unusable on blacklisted nodes

It could be moved into the AM's or client library, but then they would have to 
do the same sort of thing, and then the logic needs to be duplicated amongst 
the AM's or will only be available to those which use the library (do they 
all?).  It's worth considering if it can be made to cover them all via the 
library, but I'm not sure this isn't something which should be handled as part 
of the headroom calculation in the rm, as it is meant to provide this 
accurately, and is otherwise aware of the blacklist.  Which suggested to me 
that we already have the blacklist for the application in the RM/available to 
the scheduler (I'm not sure why that wasn't obvious to me before...), which 
does appear to be the case and which therefore drops out concerns about adding 
it - it's already there...

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> ------------------------------------------------------------------------------------------------------
>                 Key: YARN-1680
>                 URL: https://issues.apache.org/jira/browse/YARN-1680
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith
>            Assignee: Craig Welch
>         Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 

This message was sent by Atlassian JIRA

Reply via email to