[
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578867#comment-16578867
]
Eric Payne commented on YARN-8509:
----------------------------------
[~Zian Chen], thanks for your reply.
Let's take a step back.
{{LeafQueue#getTotalPendingResourcesConsideringUserLimit}} is eventually
calling {{UsersManager#computeUserLimit}} to determine each user's headroom
during preemption processing. This is the same thing that is used to calculate
a user's headroom during scheduling allocation. So, I think it is very
important to keep these the same so that the preemption monitor won't preempt
more than necessary. If these algorithms are not kept the same, preemption will
preempt a container but the scheduler will decide to give that container right
back to the same app.
{quote}this configuration should able to happen if we set user_limit_percent to
50 and user_limit_factor to 1.0f, 3.0f, 3.0f and 2.0f respectively. But within
current equation, this initial state won't happen.
{quote}
I don't think this is accurate. The minimum-user-limit-percent is part of the
calculation in order to ensure that each user can get up to its minimum
guarantee. This is to ensure that the user resource is not ever this:
(queue_used/#active users) < (queue_used * minimum_user_limit_percent). But
that is guaranteeing a minimum boundary per user, not capping any maximum
boundary. So, the initial state can certainly happen for any number of reasons.
{quote}So the point is, we should let user-limit to reach at most
queue_capacity * user_limit_factor
{quote}
I think that's one of the things {{UsersManager#computeUserLimit}} already
does. At the heart of the headroom calculations is the algorithm in
{{UsersManager#computeUserLimit}}. One of the things that this does is to
ensure that a user's headroom stays below (guaranteed_capacity *
user_limit_factor)
{quote}
| |*queue-a*|*queue-b*|*queue-c*|*queue-b|
|*Guaranteed*|30|30|30|10|
|*Used*|10|40|50|0|
|*Pending*|6|30|30|0|
{quote}
I don't think the updated use case documents a problems. I have reproduced the
used case in a 7-node mini-cluster, and I have demonstrated that even when the
queues are set up as described above with the apps having the described used
and pending resources, the preemption monitor will preempt just the right
amount and re-balance the queues as below:
| |*queue-a*|*queue-b*|*queue-c*|*queue-d*|
|*Guaranteed*|30|30|30|10|
|*Used*|16|42|42|0|
|*Pending*|0|28|38|0|
|*Preempted*|0|0|8|0|
This is because {{UsersManager#computeUserLimit}} leaves a buffer of 1 minimum
container size.
> Total pending resource calculation in preemption should use user-limit factor
> instead of minimum-user-limit-percent
> -------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Zian Chen
> Assignee: Zian Chen
> Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch,
> YARN-8509.003.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total
> pending resource based on user-limit percent and user-limit factor which will
> cap pending resource for each user to the minimum of user-limit pending and
> actual pending. This will prevent queue from taking more pending resource to
> achieve queue balance after all queue satisfied with its ideal allocation.
>
> We need to change the logic to let queue pending can go beyond userlimit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]