[
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143301#comment-14143301
]
Wangda Tan commented on YARN-1198:
----------------------------------
Hi [~cwelch],
Sorry for this late response, I've just looked your ver.8 patch and comments,
My reply,
bq. -re "we don't need write HeadroomProvider for each scheduler"
And
bq. Provider vs Reference
I agree with this, I think we need write different Headroom Provider and it's
better to keep Provider since its more general.
bq. -re "As mentioned by Jason, currently ...
Agree, this can be done in a separated JIRA
bq. -re the cost of the calculation
Agree, it's just a small computation effort.
In the past, I suggest do as I mentioned
https://issues.apache.org/jira/browse/YARN-1198?focusedCommentId=14108991&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14108991
because I think that will make code more clean.
But according to your ver.8 patch, I realized that may not doable. In
LeafQueue#computeUserLimit, it uses required to get user limit. In your patch,
you save the lastRequired to user class. However, we need different required
for different app under a same user. We can only do the calculate when app
heartbeats (We can also loop and set all app's headroom, but that's a way we
abandoned before).
So basically, IMHO, I think your ver.7 is a more correct way to go. Which keeps
complexity/efficiency balanced.
Any thoughts? [~jianhe], [~cwelch].
Wangda
> Capacity Scheduler headroom calculation does not work as expected
> -----------------------------------------------------------------
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Omkar Vinit Joshi
> Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch,
> YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch,
> YARN-1198.8.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for
> this calculation
> * If a container finishes then headroom for that application will change and
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom
> per User per LeafQueue so that everyone gets the same picture (apps belonging
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications
> submitted by all users in that queue should be notified of the headroom
> change.
> * Also today headroom is an absolute number ( I think it should be normalized
> but then this is going to be not backward compatible..)
> * Also when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)