[
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155764#comment-14155764
]
Craig Welch commented on YARN-1198:
-----------------------------------
[~john.jian.fang] I look a look at implementing the change with the tweaked .7
approach per your suggestion above and it seemed to just be trading some
complexities for others, so I set it aside and I think the current .7 approach
is as good as any. I uploaded a .10 patch which is the .7 fixed to apply
cleanly to current trunk (.7 no longer quite does for me). I took a look at
incorporating [YARN-1857] into this change but chose not to, as I think they
should be committed independently. The .10 (.7) patch factors the change for
[YARN-1857] up into a different method, getHeadroom(), if you replace it with
the below:
{code}
private Resource getHeadroom(User user, Resource queueMaxCap,
Resource clusterResource, Resource userLimit) {
Resource headroom =
Resources.min(resourceCalculator, clusterResource,
Resources.subtract(
Resources.min(resourceCalculator, clusterResource,
userLimit, queueMaxCap),
user.getConsumedResources()),
Resources.subtract(queueMaxCap, usedResources));
return headroom;
}
{code}
then you should have the combined logic. Note, the LeafQueue tests will then
not all pass, I believe because results changed when that patch was applied -
I've not before tried the two in combination, assuming we would apply one at a
time, and then address the impact on the other.
> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Omkar Vinit Joshi
> Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch,
> YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch,
> YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for
> this calculation
> * If a container finishes then headroom for that application will change and
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom
> per User per LeafQueue so that everyone gets the same picture (apps belonging
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications
> submitted by all users in that queue should be notified of the headroom
> change.
> * Also today headroom is an absolute number ( I think it should be normalized
> but then this is going to be not backward compatible..)
> * Also when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)