[
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155619#comment-16155619
]
Sunil G commented on YARN-7149:
-------------------------------
Thanks [~eepayne] for the detailed analysis. And thanks [~jlowe] for details,
it makes sense. I think the optimization made in YARN-5889 had two parts, one
for allocation to make it gradually shared for all users (Jason has given a
very detailed explanation on this), second is for preemption part. For this
preemption calculation, we have to consider all users resource as some could be
non-active as well. In the example given here, I think app1 would have become
deactive as all resource requests of app1 might have been served. [~eepayne]
please correct me if I am wrong
In this case i think {{getTotalPendingResourcesConsideringUserLimit}} is not
correct.
{code}
if (!userNameToHeadroom.containsKey(userName)) {
User user = getUser(userName);
Resource headroom = Resources.subtract(
getResourceLimitForActiveUsers(app.getUser(), clusterResources,
partition, SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY),
user.getUsed(partition));
// Make sure headroom is not negative.
headroom = Resources.componentwiseMax(headroom, Resources.none());
userNameToHeadroom.put(userName, headroom);
}
{code}
Here I think we have to use {{getResourceLimitForAllUsers}} instead of
{{getResourceLimitForActiveUsers}}. I will wait for Eric to confirm whether
app1 was not active or not.
Apart from this,
bq.Do we really need to keep the assignments balanced as users grow to their
limit?
I think we were trying to make a uniform allocation pattern here, and I feel I
can try to share more test results here how slow this is. And definitely wait
for more discussion here meanwhile.
> Cross-queue preemption sometimes starves an underserved queue
> -------------------------------------------------------------
>
> Key: YARN-7149
> URL: https://issues.apache.org/jira/browse/YARN-7149
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.9.0, 3.0.0-alpha3
> Reporter: Eric Payne
> Assignee: Eric Payne
>
> In branch 2 and trunk, I am consistently seeing some use cases where
> cross-queue preemption does not happen when it should. I do not see this in
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far
> underserved_
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]