[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533684#comment-14533684
 ] 

Craig Welch commented on YARN-1680:
-----------------------------------

bq. This requires when a node doing heartbeat with changed available resource, 
all apps blacklisted the node need to be notified

Well, that's not quite so.  From what we were talking about, it means that the 
blacklist deduction can't be a fixed amount but that it needs to be calculated 
by looking at the unused resource of the blacklisted nodes during headroom 
calculation.  The rest of the above proposal for detecting changes, etc, works, 
but instead of a static deduction value we would need a reference to the 
blacklisted nodes for the app and look at their unused resources during the 
apps headroom calculation, so there is that cost, but it's not related to the 
heartbeat or a notification as such

bq. headroom for app could be under estimated

I think, generally, we should not take an approach which will 
underestimate/underutilize if we have 6302 to fall back on.  If we don't, then 
we might want to do it only if we decide not to do the accurate calculation in 
some cases based on limits (see immediately below), but not as a matter of 
course.

bq. Only do accurate headroom calculation when there're not too much 
blacklisted nodes as well as apps with blacklisted nodes.

I think if we put a limit on it, it should be a purely local decision, to only 
do the calculation with < x blacklisted nodes for an app, which we would expect 
to rarely be an issue.  There is a potential for performance issues here, but 
we don't really know how great a concern it is.

bq.  MAPREDUCE-6302 is targeting to preempt reducer even if we reported 
inaccurate headroom for apps. I think the approach looks good to me

I think that may work as a fallback option for MR, assuming it works out 
without issue, if we decide to not do the proper headroom calculation in some 
cases, but that's MR specific so it won't help non MR apps, and it has the 
issues I brought up before with performance degradation vs the proper headroom 
calculation.  For these reasons I don't think it's a substitute for fixing this 
issue overall, it may be a fallback option if we limit the cases where we do 
the proper adjustment.

bq. Move headroom calculation to application side, I think now we cannot do it 
at least for now...Application will only receive updated NodeReport from when 
node changes heathy status instead of regular heartbeat

Well, in some sense that works OK for this because we really only need to know 
about those changes in node status status wrt the blacklist to detect 
recalculation changes with the approach proposed above.  The problem is that we 
will also need a way to query for current usage per node while doing the 
calculation, I don't know if an efficient call for that exists (it would 
ideally be batch for N nodes where we would ask for all the blacklisted nodes 
at once.)  There is also the broader issue that we don't seem to have a single 
entry point client-side for doing this right now, so we would need to touch a 
few points to add a library/something of that nature to do this, and for AM's 
we may not be aware of/that are not part of the core, they would have to 
potentially do some integration to get this.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1680
>                 URL: https://issues.apache.org/jira/browse/YARN-1680
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith
>            Assignee: Craig Welch
>         Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to