[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747056#comment-15747056
 ] 

Wangda Tan commented on YARN-5889:
----------------------------------

Thanks [~jlowe] for such detailed suggestions, after look at existing logic and 
your solution, I finally understand why we have different proposals:

The first thing we need to decide is, what the semantic of 
minimum-user-limit-percent (MULP) should be.
>From the doc: 
>https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,
The MULP is not a "minimum guarantee" at all, it is actually upper limit of 
users (which of course will be capped by user-limit-factor and 
queue-maximum-capacity):
bq. "... If a third user submits an application, no single user can use more 
than 33% of the queue resources ..."

This semantic is clear enough when #users <= (100 / MULP), but it is unclear 
when when #users > (100 / MULP). For example:

{code}
Let's say we have 5 apps belong to 5 users:

App: a1, a2, a3, a4, a5 
Usr: u1, u2, u3, u4, u5

MULP set to 25.
At the time=T, resource usage: a1=25,a2=20,a3=30,a4=20,a5=5 

Assume now a4/a5 are active application, which one should get the next 
available resource?
{code}

I believe it should be u4, but how about next example:
{code}
Let's say we have 6 apps belong to 5 users:

App: a1, a2, a3, a4, a5, a6
Usr: u1, u2, u3, u4, u5, u4

MULP set to 25.
At the time=T, resource usage: a1=25,a2=20,a3=30,a4=15,a5=5,a6=5

Assume now a5/a6 are active application, which one should get the next 
available resource?
{code}

Existing behavior in scheduler is a5 get resource first. But with this, we 
cannot have guaranteed capacity of user any more. Since we have MULP set to 25, 
but the user u5 get new resource before u4 reaches configured MULP.

If we all agree this behavior (FIFO order comes before user limit), we can use 
approach from Jason. In the other hand, if we want the first (100/MULP) users 
have guaranteed (100/MULP) capacity, the order of application mattters.

_TL;DR_

Now I change my mind, it's better to make the behavior consistent :). So we 
don't have to make user limit calculated in a separated thread. As proposed by 
Jason, we can calculate two user limits:

1) Active user limit for allocation: We will use resource-used-by-active-users 
and #active-users to calculate active user limit.

{code}
active-user-limit = min(
        max(resource-used-by-active-users, 
            queue-configured-resource - resource-used-by-non-active-users)
             / min(#active-users, 100 / MULP),
        queue-configured-resource * ULP)
{code}

This looks very similar to how we compute user limit today, the only difference 
is it uses resource-used-by-active-users instead of total-resource because we 
should fairly divide available resource among active users in the queue.

2) User limit for preemption:

When we doing preemption, what we will do is:

{code}
all-user-limit = min(
        max(resource-used-by-all-users, 
            queue-configured-resource) / min(#users, 100 / MULP),
        queue-configured-resource * ULP)
{code}

Thoughts?

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to