[
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074249#comment-14074249
]
Wangda Tan commented on YARN-2069:
----------------------------------
Hi [~mayank_bansal],
Thanks for working on this again. I've taken a brief look at your patch, I
think the general appoarch in your patch is:
- Compute a target-user-limit for a given queue,
- Preempt containers according to a user's current comsumption and
target-user-limit,
- If more resource need to be preempted, we should consider preempt AM
container,
I think there're couple of rules we need respect (Please let me know if you
don't agree with any of them),
# Used resource of users in a queue after preempted should be as average as
possible
# Before we start preempting AM containers, all task containers should be
preempted (according to YARN-2022, keep preempting AM container as least
priority)
# If we should preempt AM container, we should respect #1 too
For #1,
If we want to quantize the result, it should be:
{code}
i∈{user}
Let rp_i = used-resource-after-preemption of user_i
Minimize sqrt(Σ(rp - Σ(rp_i)/#{user})^2)
i i
{code}
In another word, we should minimize standard deviation of
used-resource-after-preemption.
Since not all containers are equal in size, so it is possible that
used-resource-after-preemption of a given user cannot precisely equal to
target-user-limit. In our current logic, we will make
used-resource-after-preemption <= target-user-limit. considering following
example,
{code}
qA: has user {V, W, X, Y, Z}; each user has one application
V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G,
minimum_allocation=1G
W: app4: {4, 4, 4, 4},
X: app3: {4, 4, 4, 4},
Y: app2: {4, 4, 4, 4, 4, 4},
Z: app1: {4}
target-user-limit=11,
resource-to-obtain=23
After preemption:
V: {4, 4}
W: {4, 4}
X: {4, 4}
Y: {4, 4, 4, 4, 4, 4}
Z: {4}
{code}
This imbalance happens because, for every application we preempted, may excess
user-limit (bias), the more user we processed, the more potentially accumulated
bias we might have. In another word, the un-balanced is linear correlated
number-of-user-in-a-queue multiplies average-container-size
And we cannot solve this problem by preempting from user has most usage, still
the example:
{code}
qA: has user {V, W, X, Y, Z}; each user has one application
V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G,
minimum_allocation=1G
W: app4: {4, 4, 4, 4},
X: app3: {4, 4, 4, 4},
Y: app2: {4, 4, 4, 4, 4, 4},
Z: app1: {4}
target-user-limit=11,
resource-to-obtain=23
After preemption (from user has most usage, the sequence is Y->V->W->X->Z):
V: {4, 4}
W: {4, 4, 4, 4}
X: {4, 4, 4, 4}
Y: {4, 4}
Z: {4}
{code}
Still not very balanced, the ideal result should be:
{code}
V: {4, 4, 4}
W: {4, 4, 4}
X: {4, 4, 4}
Y: {4, 4, 4}
Z: {4}
{code}
In addition, this appoarch cannot resolve rule #2/#3 as well if
target-user-limit is not appropriately computed.
So I propose to do in another way,
We should recompute used-resource - marked-preempted-resource every time for a
user after making decision of preemption each container. Maybe we can use a
priority queue here to store (used-resource - marked-preempted-resource) here.
And we don’t need to compute a target user limit here.
The pseudo code for preempting resource of a queue might look like:
{code}
compute resToObtain first;
// first preempt task containers
while (resToObtain > 0) {
pick a user-x which has most (used-resource - marked-preempted-resource)
pick one container-y from user to preempted
resToObtain -= container-y.resource
}
if (resToObtain <= 0) {
return;
}
// if more resource need to be preempted, we should preempt AM container
while (resToObtain > 0 && total-am-resource - marked-preempted-am-resource >
max-am-percentage) {
// do the same thing again:
pick a user-x which has most (used-resource - marked-preempted-resource)
pick one container-y from user to preempted
resToObtain -= container-y.resource
}
{code}
With this, we can make the un-balanced linear correlated with
average-container-size only and solved the #2/#3 rules we should respect I
mentioned before altogether.
Mayank, do you think is it looks like a reasonable suggestion? Any other
thoughts? [~vinodkv], [~curino], [~sunilg].
Thanks,
Wangda
> CS queue level preemption should respect user-limits
> ----------------------------------------------------
>
> Key: YARN-2069
> URL: https://issues.apache.org/jira/browse/YARN-2069
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacityscheduler
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Mayank Bansal
> Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch,
> YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch,
> YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch
>
>
> This is different from (even if related to, and likely share code with)
> YARN-2113.
> YARN-2113 focuses on making sure that even if queue has its guaranteed
> capacity, it's individual users are treated in-line with their limits
> irrespective of when they join in.
> This JIRA is about respecting user-limits while preempting containers to
> balance queue capacities.
--
This message was sent by Atlassian JIRA
(v6.2#6252)