[
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068791#comment-14068791
]
Craig Welch commented on YARN-1198:
-----------------------------------
[~wangda] I concur with [~jlowe] and [~airbots] that these headroom fixes (incl
[YARN-2008]) should happen. I don't think that this is a redefinition of
headroom, "headroom" remains "the maximum resource of an application can get" -
the application can't get resources which are not available because they are in
use, which is what the change addresses. I think of this change as really only
being a fix for a missed case - and it will in fact return the same value as it
does today except under some specific cases of higher cluster utilization, in
which case the value it returns will actually be better than it's current
behavior in terms of helping the AM to work accurately and preventing some
known deadlock conditions. This kind of behavior is a necessary consequence of
allowing oversubscription of cluster resources vis - a - vis the "maximum"
allocation which is greater than the baseline (and which in aggregate can be >
100%), and this oversubscription is a reasonable design choice to allow
applications to burst above their guaranteed level when other queues are less
utilized. As I mentioned on [YARN-2008], since the aggregate maximum can be >
100% it's not possible to solve this solely with preemption - AM's will still
be getting higher values than are available without this correction - and
retaining the "max" behavior for the reasons above, this kind of approach is
going to be the way to go.
> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
> Attachments: YARN-1198.1.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for
> this calculation
> * If a container finishes then headroom for that application will change and
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom
> per User per LeafQueue so that everyone gets the same picture (apps belonging
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications
> submitted by all users in that queue should be notified of the headroom
> change.
> * Also today headroom is an absolute number ( I think it should be normalized
> but then this is going to be not backward compatible..)
> * Also when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations
--
This message was sent by Atlassian JIRA
(v6.2#6252)