[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730055#comment-15730055
 ] 

Wangda Tan commented on YARN-5889:
----------------------------------

Thanks [~sunilg] for working on the patch and suggestions from [~jlowe], 
[~eepayne]. 

Personally I think the title and desc are a little confusing.

First of all, I think the most important target of this JIRA is not improving 
performance. It is to make user-limit preemption correct. Currently we compute 
an unique user-limit value for each leaf queue, this is enough for allocation 
but not enough for preemption. Here is an example.

A queue has cap=max-cap=100, min-user-limit-percent=50, user-limit-factor=1, at 
time T, there're 2 users using resources:
{code}
u1.used = 75, u2.used = 25
{code}
Only u2 is active user,

According to existing user limit computation:
{code}
user_limit =
  round_up(
    min(
        max(current_capacity / #active_user,
             current_capacity * user_limit_percent),
        queue_capacity * user_limit_factor)),
    minimum_allocation)
{code} 
Computed user-limit=100, more than any user's usage, so there's nothing will be 
preempted.

We can give many other examples like:
{code}
minimum-user-limit-percent = 33
3 users:
u1.used = 50, u2.used = 20, u3.used = 30
u2/u3 are active users 
{code}
The computed user-limit = 50, which makes preemption cannot kick in.

This problem could happen when #active-user < #total-user. The problem is, at 
the allocation stage, we only need check active users. But in preemption, we 
need to preempt resource from non-active users.

To solve the problem, we need to compute user limit considering non-active 
users. If a non-active user uses less than minimum-user-limit, we can continue 
distribute its available quotas to other active users; in the other hand, if a 
non-active user uses more than minimum-user-limit, we could also get resource 
from the user. This computation is more expensive, it should be O(N), N is 
number of applications in the queue.

That is why we need an async thread to do all these stuffs: we cannot put a 
computation which is O(N) to allocation thread. To me, the common things 
between computation of (actual) user-limit and fair share (FS) are: 
- They're all too expensive to do when checking every application.
- They're all instant limit, no user should understand the computed instant 
limit. The instant limit and usage could keep changing, but it will converge to 
a balance over a period of time.

I haven't checked patch implantation yet. Please let us know your thoughts 
about the overall points. I don't want to make this change to block user-limit 
preemption effort too, so it will be more helpful if you could share ideas 
about how we can achieve user-limit preemption without the async thread 
approach.

Thanks,

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to