Craig Welch commented on YARN-1680:

I've been looking over [~airbots] prior patches, the discussion, etc, & this 
was what I was going to suggest as an approach.  As I mentioned before, I think 
that accuracy will unfortunately require holding on to the blacklist in the 
scheduler app, I think this is OK because these should be relatively small, but 
it is still a drawback.  We could impose a limit to size as a mitigating 
factor, but that could affect accuracy in some cases as well.

In any event, this is the approach I'm suggesting:
Retain a node/rack blacklist in the scheduler application based on 
addition/removals from the application master
Add a "last change" timestamp or incrementing counter to track node 
addition/removal at the cluster level (which is what exists for "cluster 
black/white" listing afaict), updated when those events occur
Add a "last change" timestamp/counter to the application to track blacklist 
have "last updated" values on the application to track the above two "last 
change" values, updated when blacklist values are recalculated
On headroom calculation, the app checks if it has any entries in the blacklist 
or if it has a "blacklist deduction" value in it's resourceusage entry (see 
below), to determine if blacklist must be taken into account
if blacklist must be taken into account, check the "last updated" values for 
both cluster and app blacklist changes, if and only if either is stale (last 
updated != last change) then recalculate the blacklist deduction
when calculating the blacklist deduction use [~airbots] basic logic from 
existing patches.  Place the deduction value into a new enumeration index type 
in ResourceUsage.  NodeLables could be taken into account as well, there is 
some logic about "label(s) of interest" on the application, in addition to a 
"no label" value which is generally applicable, a value for the "label(s) of 
interest" could be generated
whenever the headroom is handed out by the provider, add a step which applies 
the proper blacklist deduction if present

Thoughts on the approach?  

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> ------------------------------------------------------------------------------------------------------
>                 Key: YARN-1680
>                 URL: https://issues.apache.org/jira/browse/YARN-1680
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith
>            Assignee: Craig Welch
>         Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 

This message was sent by Atlassian JIRA

Reply via email to