[ 
https://issues.apache.org/jira/browse/YARN-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363215#comment-17363215
 ] 

Eric Payne commented on YARN-10802:
-----------------------------------

[~bteke], Thanks for raising this issue and for working on it. I have a 
question and an observation.
{quote}Capacity Scheduler's minimum-user-limit-percent only accepts integers, 
which means at most 100 users can use a single queue fairly
{quote}
This isn't exactly accurate.

Minimum user limit percent is only enforced when a queue's max capacity is 
reached _AND_ (100 / {{min-user-limit-pct}}) users are both using resources and 
asking for more resources. As long as the queue's max capacity is not reached 
_AND_ there are more resources available in the system, the 101st, 102nd, 
103rd, etc., will be assigned resources.

So, my question is, do you have a use case where
 1. 100 users are using up the max capacity in the queue
 2. All 100 users are active (that is, requesting more resources)
 3. The 101st user comes in and is starved because, as containers are released, 
they are assigned to one of the first 100 (again, because they are all asking 
for resources)?

We have several very-heavily-used multi-tenant queues that often have 100 or 
more users running, but only a subset of them are actively requesting resources.

My observation is that when we have set the min-user-limit-pct to be 1 in a 
very highly used multi-tenant queue, the user limit grows way too slowly. The 
min-user-limit-pct is used in calculating the user limit (seen as "Max 
Resources" in the queue's pull-down menu in the RM GUI). When the queue grows 
above its capacity but is still below its max capacity, the calculations for 
user limit in {{UsersManager#computeUserLimit}} uses the min-user-limit-pct to 
limit how fast the user limit can grow. The smaller the min-user-limit-pct is, 
the slower it grows. What ends up happening is that a few users want to grow 
larger, but several smaller users come in, request resources, and leave without 
ever reaching the current user limit. This process repeats because there are 
several new active users all the time, so the longer-running, larger users 
can't grow beyond a certain limit even though there are still available queue 
and cluster resources.

> Change Capacity Scheduler minimum-user-limit-percent to accept decimal values
> -----------------------------------------------------------------------------
>
>                 Key: YARN-10802
>                 URL: https://issues.apache.org/jira/browse/YARN-10802
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: YARN-10802.001.patch, YARN-10802.002.patch, 
> YARN-10802.003.patch, YARN-10802.004.patch
>
>
> Capacity Scheduler's minimum-user-limit-percent only accepts integers, which 
> means at most 100 users can use a single queue fairly. Using decimal values 
> could solve this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to