[
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573997#comment-16573997
]
Zian Chen edited comment on YARN-8509 at 8/10/18 9:21 PM:
----------------------------------------------------------
Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded
before is not correct due to misunderstand of the original problem.
I have changed the Jira title. The intention of this Jira is to fix calculation
of pending resource consider user-limit in preemption scenario. Currently,
pending resource calculation in preemption uses the calculation algorithm in
scheduling which is this one,
{code:java}
user_limit = min(max(current_capacity)/ #active_users, current_capacity *
user_limit_percent), queue_capacity * user_limit_factor)
{code}
this is good for scheduling cause we want to make sure users can get at least
"minimum-user-limit-percent" of resource to use, which is more like a lower
bound of user-limit. However we should not capture total pending resource a
leaf queue can get by minimum-user-limit-percent, instead, we want to use
user-limit-factor which is the upper bound to capture pending resource in
preemption. Cause if we use minimum-user-limit-percent to capture pending
resource, resource under-utilization will happen in preemption scenario. Thus,
we suggest the pending resource calculation for preemption should use this
formula.
{code:java}
total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ
(min {
User.ulf(partition) - User.used(partition), User.pending(partition})}
{code}
Let me give an example,
{code:java}
Root
/ | \ \
a b c d
30 30 30 10
1) Only one node (n1) in the cluster, it has 100G.
2) app1 submit to queue-a, asks for 10G used, 6G pending.
3) app2 submit to queue-b, asks for 40G used, 30G pending.
4) app3 submit to queue-c, asks for 50G used, 30G pending.
{code}
Here we only have one user, and user-limit-factor for queues are
||Queue name|| minimum-user-limit-percent ||user-limit-factor||
| a| 50| 1.0 f|
| b| 50| 3.0 f|
| c| 50| 3.0 f|
| d| 50| 2.0 f|
With old calculation, user-limit for queue-a is 30G, which can let app1 has 6G
pending, but user-limit for queue-b becomes 40G, which makes headroom become
zero after subtract 40G used, the 30G pending resource been asked can not be
accepted, same thing with queue-c too.
However if we see this test case in preemption point of view, we should allow
queue-b and queue-c take more pending resources. Because even though queue-a
has 30G guaranteed configured, it's under utilization. And by pending resource
captured by the old algorithm, queue-b and queue-c can not take available
resource through preemption which make the cluster resource not used
effectively.
To summarize, since user-limit-factor maintains the hard-limit of how much
resource can be used by a user, we should calculate pending resource consider
user-limit-factor instead of minimum-user-limit-percent.
Could you share your opinion on this, [~eepayne]?
was (Author: zian chen):
Hi Eric, thanks for the comments. Discussed with Wangda, the patch uploaded
before is not correct due to misunderstand of the original problem.
I have changed the Jira title. The intention of this Jira is to fix calculation
of pending resource consider user-limit in preemption scenario. Currently,
pending resource calculation in preemption uses the calculation algorithm in
scheduling which is this one,
{code:java}
user_limit = min(max(current_capacity)/ #active_users, current_capacity *
user_limit_percent), queue_capacity * user_limit_factor)
{code}
this is good for scheduling cause we want to make sure users can get at least
"minimum-user-limit-percent" of resource to use, which is more like a lower
bound of user-limit. However we should not capture total pending resource a
leaf queue can get by minimum-user-limit-percent, instead, we want to use
user-limit-factor which is the upper bound to capture pending resource in
preemption. Cause if we use minimum-user-limit-percent to capture pending
resource, resource under-utilization will happen in preemption scenario. Thus,
we suggest the pending resource calculation for preemption should use this
formula.
{code:java}
total_pending(partition,queue) = min {Q_max(partition) - Q_used(partition), Σ
(min {
User.ulf(partition) - User.used(partition), User.pending(partition})}
{code}
Let me give an example,
{code:java}
Root
/ | \ \
a b c d
30 30 30 10
1) Only one node (n1) in the cluster, it has 100G.
2) app1 submit to queue-a, asks for 10G used, 6G pending.
3) app2 submit to queue-b, asks for 40G used, 30G pending.
4) app3 submit to queue-c, asks for 50G used, 30G pending.
{code}
Here we only have one user, and user-limit-factor for queues are
||Queue name|| minimum-user-limit-percent ||user-limit-factor||
| a| 1| 1.0 f|
| b| 1| 2.0 f|
| c| 1| 2.0 f|
| d| 1| 2.0 f|
With old calculation, user-limit for queue-a is 30G, which can let app1 has 6G
pending, but user-limit for queue-b becomes 40G, which makes headroom become
zero after subtract 40G used, the 30G pending resource been asked can not be
accepted, same thing with queue-c too.
However if we see this test case in preemption point of view, we should allow
queue-b and queue-c take more pending resources. Because even though queue-a
has 30G guaranteed configured, it's under utilization. And by pending resource
captured by the old algorithm, queue-b and queue-c can not take available
resource through preemption which make the cluster resource not used
effectively.
To summarize, since user-limit-factor maintains the hard-limit of how much
resource can be used by a user, we should calculate pending resource consider
user-limit-factor instead of minimum-user-limit-percent.
Could you share your opinion on this, [~eepayne]?
> Total pending resource calculation in preemption should use user-limit factor
> instead of minimum-user-limit-percent
> -------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Zian Chen
> Assignee: Zian Chen
> Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch,
> YARN-8509.003.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total
> pending resource based on user-limit percent and user-limit factor which will
> cap pending resource for each user to the minimum of user-limit pending and
> actual pending. This will prevent queue from taking more pending resource to
> achieve queue balance after all queue satisfied with its ideal allocation.
>
> We need to change the logic to let queue pending can go beyond userlimit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]