[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074249#comment-14074249
 ] 

Wangda Tan commented on YARN-2069:
----------------------------------

Hi [~mayank_bansal],
Thanks for working on this again. I've taken a brief look at your patch, I 
think the general appoarch in your patch is:
- Compute a target-user-limit for a given queue,
- Preempt containers according to a user's current comsumption and 
target-user-limit,
- If more resource need to be preempted, we should consider preempt AM 
container,

I think there're couple of rules we need respect (Please let me know if you 
don't agree with any of them),
# Used resource of users in a queue after preempted should be as average as 
possible
# Before we start preempting AM containers, all task containers should be 
preempted (according to YARN-2022, keep preempting AM container as least 
priority)
# If we should preempt AM container, we should respect #1 too

For #1,
If we want to quantize the result, it should be:
{code}
i∈{user}
Let rp_i = used-resource-after-preemption of user_i
Minimize sqrt(Σ(rp - Σ(rp_i)/#{user})^2)
              i      i
{code}
In another word, we should minimize standard deviation of 
used-resource-after-preemption.

Since not all containers are equal in size, so it is possible that 
used-resource-after-preemption of a given user cannot precisely equal to 
target-user-limit. In our current logic, we will make 
used-resource-after-preemption <= target-user-limit. considering following 
example,
{code}
qA: has user {V, W, X, Y, Z}; each user has one application
V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G, 
minimum_allocation=1G
W: app4: {4, 4, 4, 4},
X: app3: {4, 4, 4, 4},
Y: app2: {4, 4, 4, 4, 4, 4},
Z: app1: {4}
target-user-limit=11,
resource-to-obtain=23

After preemption:
V: {4, 4}
W: {4, 4}
X: {4, 4}
Y: {4, 4, 4, 4, 4, 4}
Z: {4}
{code}
This imbalance happens because, for every application we preempted, may excess 
user-limit (bias), the more user we processed, the more potentially accumulated 
bias we might have. In another word, the un-balanced is linear correlated 
number-of-user-in-a-queue multiplies average-container-size

And we cannot solve this problem by preempting from user has most usage, still 
the example: 
{code}

qA: has user {V, W, X, Y, Z}; each user has one application
V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G, 
minimum_allocation=1G
W: app4: {4, 4, 4, 4},
X: app3: {4, 4, 4, 4},
Y: app2: {4, 4, 4, 4, 4, 4},
Z: app1: {4}
target-user-limit=11,
resource-to-obtain=23

After preemption (from user has most usage, the sequence is Y->V->W->X->Z):
V: {4, 4}
W: {4, 4, 4, 4}
X: {4, 4, 4, 4}
Y: {4, 4}
Z: {4} 
{code}
Still not very balanced, the ideal result should be:
{code}

V: {4, 4, 4}
W: {4, 4, 4}
X: {4, 4, 4}
Y: {4, 4, 4}
Z: {4} 
{code}

In addition, this appoarch cannot resolve rule #2/#3 as well if 
target-user-limit is not appropriately computed. 

So I propose to do in another way,
We should recompute used-resource - marked-preempted-resource every time for a 
user after making decision of preemption each container. Maybe we can use a 
priority queue here to store (used-resource - marked-preempted-resource) here. 
And we don’t need to compute a target user limit here.
The pseudo code for preempting resource of a queue might look like:
{code}
compute resToObtain first;

// first preempt task containers
while (resToObtain > 0) {
  pick a user-x which has most (used-resource - marked-preempted-resource)
  pick one container-y from user to preempted
  resToObtain -= container-y.resource
}

if (resToObtain <= 0) {
  return;
}

// if more resource need to be preempted, we should preempt AM container
while (resToObtain > 0 && total-am-resource - marked-preempted-am-resource > 
max-am-percentage) {
  // do the same thing again:
  pick a user-x which has most (used-resource - marked-preempted-resource)
  pick one container-y from user to preempted
  resToObtain -= container-y.resource 
}
{code}

With this, we can make the un-balanced linear correlated with 
average-container-size only and solved the #2/#3 rules we should respect I 
mentioned before altogether.
Mayank, do you think is it looks like a reasonable suggestion? Any other 
thoughts? [~vinodkv], [~curino], [~sunilg].

Thanks,
Wangda

> CS queue level preemption should respect user-limits
> ----------------------------------------------------
>
>                 Key: YARN-2069
>                 URL: https://issues.apache.org/jira/browse/YARN-2069
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Mayank Bansal
>         Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
> YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
> YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch
>
>
> This is different from (even if related to, and likely share code with) 
> YARN-2113.
> YARN-2113 focuses on making sure that even if queue has its guaranteed 
> capacity, it's individual users are treated in-line with their limits 
> irrespective of when they join in.
> This JIRA is about respecting user-limits while preempting containers to 
> balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to