Eric Payne commented on YARN-4945:

bq. Have a Map of username to headroom inside the method can compute user limit 
at most once for different user. And this logic can be reused to compute 
user-limit preemption
Maybe we are talking about the same thing, but I just want to clarify that I am 
not advocating preemption based on headroom (user-limit-factor). I am 
advocating based on minimum user limit percent (MULP), which is the minimum 
guaranteed resource amount per user per queue.

To be honest, I haven't thought a good way that a list of policies can better 
solve the priority + user-limit preemption problem. Could you share some ideas 
about it. For example, how to better consider both in the final decision
I believe that the two preemption policies (priority and 
minimum-user-limit-percent) are _mostly_ (but not completely) separate. I would 
say that priority preemption only considers apps from the same user, and MULP 
preemption only considers apps from different users.

If you look at the behavior of the capacity scheduler, I was surprised to find 
that it mostly ignores priority when assigning resources between apps of 
different users. I conducted the following experiment, without turning on 

# The cluster has only 1 queue, and it takes up all of the resources.
||Queue Name||Total 
|default|24|1.0 (each user can take up the whole queue if no other users are 
present)|0.25 (if other users are present, each user is guaranteed at least 25% 
of the queue's resources; at max, 4 users can have apps in the queue at once; 
if less than 4 users, the scheduler tries to balance resources evenly between 
# {{user1}} starts {{app1}} in {{default}} of {{priority1}} and consumes all 
# {{user2}} starts {{app2}} in {{default}} of {{priority2}}.
||User Name||App Name||App Priority||Used Containers||Pending Containers||
# I kill 12 containers from {{app1}} and the capacity scheduler assigns them to 
{{app2}}. Not because {{app2}} has a higher priority than {{app1}}, but because 
{{user2}} is using less resources than {{user1}} (the capacity scheduler tries 
to balance resources between users).
||User Name||App Name||App Priority||Used Containers||Pending Containers||
# At this point, what should happen if I kill another container from {{app1}}? 
Since {{app2}} is higher priority than {{app1}}, and since MULP is 25% (so 
{{user2}}'s minimum guarantee is only 6), you might think that the capacity 
scheduler will give it to {{app2}} (that's what I thought it would do). _But it 
doesn't!_ The capacity scheduler gives the container back to {{app1}} because 
it wants to balance the resources between all users.  And the table remains the 
||User Name||App Name||App Priority||Used Containers||Pending Containers||

Once the users are balanced, no matter how many times I kill a container from 
{{app1}}, it always goes back to {{app1}}. From a priority perspective, this 
could be considered an inversion, since {{app2}} is asking for more resources 
and {{app1}} is well above its MULP. But the capacity scheduler does not 
consider priority in this case.

If I try the same experiment, but with both apps owned by {{user1}}, then I can 
kill all of {{app1}}'s containers (except the AM) and they all get assigned to 

Because the capacity scheduler behaves this way, I would recommend that the 
MULP preemption policy run first and try to balance each user's ideal assigned. 
The MULP policy would preempt from lowest priority first, so it would consider 
priority of apps owned by other, over-served users when deciding what to 
preempt, and consider priority of apps owned by the current, under-served user, 
when deciding ideal-assigned values.

Then, I would run the priority policy but only consider apps within each user. 
As shown above, once the users are balanced between each other with regard to 
MULP, trying to kill containers from higher priority apps of other users will 
only cause preemption churn.

[~leftnoteasy], as you said, we may be able to combine the two into one policy, 
and you may be right that this can be done without being too complicated. The 
thing I want to ensure is that the priority preemption policy doesn't try to 
kill high priority containers from different users that will only be reassigned 
back to the original user and cause preemption churn.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> -------------------------------------------------------
>                 Key: YARN-4945
>                 URL: https://issues.apache.org/jira/browse/YARN-4945
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>         Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, 
> YARN-2009.v1.patch, YARN-2009.v2.patch
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to