[
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533684#comment-14533684
]
Craig Welch commented on YARN-1680:
-----------------------------------
bq. This requires when a node doing heartbeat with changed available resource,
all apps blacklisted the node need to be notified
Well, that's not quite so. From what we were talking about, it means that the
blacklist deduction can't be a fixed amount but that it needs to be calculated
by looking at the unused resource of the blacklisted nodes during headroom
calculation. The rest of the above proposal for detecting changes, etc, works,
but instead of a static deduction value we would need a reference to the
blacklisted nodes for the app and look at their unused resources during the
apps headroom calculation, so there is that cost, but it's not related to the
heartbeat or a notification as such
bq. headroom for app could be under estimated
I think, generally, we should not take an approach which will
underestimate/underutilize if we have 6302 to fall back on. If we don't, then
we might want to do it only if we decide not to do the accurate calculation in
some cases based on limits (see immediately below), but not as a matter of
course.
bq. Only do accurate headroom calculation when there're not too much
blacklisted nodes as well as apps with blacklisted nodes.
I think if we put a limit on it, it should be a purely local decision, to only
do the calculation with < x blacklisted nodes for an app, which we would expect
to rarely be an issue. There is a potential for performance issues here, but
we don't really know how great a concern it is.
bq. MAPREDUCE-6302 is targeting to preempt reducer even if we reported
inaccurate headroom for apps. I think the approach looks good to me
I think that may work as a fallback option for MR, assuming it works out
without issue, if we decide to not do the proper headroom calculation in some
cases, but that's MR specific so it won't help non MR apps, and it has the
issues I brought up before with performance degradation vs the proper headroom
calculation. For these reasons I don't think it's a substitute for fixing this
issue overall, it may be a fallback option if we limit the cases where we do
the proper adjustment.
bq. Move headroom calculation to application side, I think now we cannot do it
at least for now...Application will only receive updated NodeReport from when
node changes heathy status instead of regular heartbeat
Well, in some sense that works OK for this because we really only need to know
about those changes in node status status wrt the blacklist to detect
recalculation changes with the approach proposed above. The problem is that we
will also need a way to query for current usage per node while doing the
calculation, I don't know if an efficient call for that exists (it would
ideally be batch for N nodes where we would ask for all the blacklisted nodes
at once.) There is also the broader issue that we don't seem to have a single
entry point client-side for doing this right now, so we would need to touch a
few points to add a library/something of that nature to do this, and for AM's
we may not be aware of/that are not part of the core, they would have to
potentially do some integration to get this.
> availableResources sent to applicationMaster in heartbeat should exclude
> blacklistedNodes free memory.
> ------------------------------------------------------------------------------------------------------
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacityscheduler
> Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3
> Reporter: Rohith
> Assignee: Craig Welch
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch,
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption
> calculation, headRoom is considering blacklisted nodes memory. This makes
> jobs to hang forever(ResourceManager does not assing any new containers on
> blacklisted nodes but returns availableResouce considers cluster free
> memory).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)