[
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747056#comment-15747056
]
Wangda Tan commented on YARN-5889:
----------------------------------
Thanks [~jlowe] for such detailed suggestions, after look at existing logic and
your solution, I finally understand why we have different proposals:
The first thing we need to decide is, what the semantic of
minimum-user-limit-percent (MULP) should be.
>From the doc:
>https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,
The MULP is not a "minimum guarantee" at all, it is actually upper limit of
users (which of course will be capped by user-limit-factor and
queue-maximum-capacity):
bq. "... If a third user submits an application, no single user can use more
than 33% of the queue resources ..."
This semantic is clear enough when #users <= (100 / MULP), but it is unclear
when when #users > (100 / MULP). For example:
{code}
Let's say we have 5 apps belong to 5 users:
App: a1, a2, a3, a4, a5
Usr: u1, u2, u3, u4, u5
MULP set to 25.
At the time=T, resource usage: a1=25,a2=20,a3=30,a4=20,a5=5
Assume now a4/a5 are active application, which one should get the next
available resource?
{code}
I believe it should be u4, but how about next example:
{code}
Let's say we have 6 apps belong to 5 users:
App: a1, a2, a3, a4, a5, a6
Usr: u1, u2, u3, u4, u5, u4
MULP set to 25.
At the time=T, resource usage: a1=25,a2=20,a3=30,a4=15,a5=5,a6=5
Assume now a5/a6 are active application, which one should get the next
available resource?
{code}
Existing behavior in scheduler is a5 get resource first. But with this, we
cannot have guaranteed capacity of user any more. Since we have MULP set to 25,
but the user u5 get new resource before u4 reaches configured MULP.
If we all agree this behavior (FIFO order comes before user limit), we can use
approach from Jason. In the other hand, if we want the first (100/MULP) users
have guaranteed (100/MULP) capacity, the order of application mattters.
_TL;DR_
Now I change my mind, it's better to make the behavior consistent :). So we
don't have to make user limit calculated in a separated thread. As proposed by
Jason, we can calculate two user limits:
1) Active user limit for allocation: We will use resource-used-by-active-users
and #active-users to calculate active user limit.
{code}
active-user-limit = min(
max(resource-used-by-active-users,
queue-configured-resource - resource-used-by-non-active-users)
/ min(#active-users, 100 / MULP),
queue-configured-resource * ULP)
{code}
This looks very similar to how we compute user limit today, the only difference
is it uses resource-used-by-active-users instead of total-resource because we
should fairly divide available resource among active users in the queue.
2) User limit for preemption:
When we doing preemption, what we will do is:
{code}
all-user-limit = min(
max(resource-used-by-all-users,
queue-configured-resource) / min(#users, 100 / MULP),
queue-configured-resource * ULP)
{code}
Thoughts?
> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Sunil G
> Assignee: Sunil G
> Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch,
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with
> a write lock. To improve performance, this tickets is focussing on moving
> user-limit calculation out of heartbeat allocation flow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]