[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578867#comment-16578867
 ] 

Eric Payne commented on YARN-8509:
----------------------------------

[~Zian Chen], thanks for your reply.

Let's take a step back. 
{{LeafQueue#getTotalPendingResourcesConsideringUserLimit}} is eventually 
calling {{UsersManager#computeUserLimit}} to determine each user's headroom 
during preemption processing. This is the same thing that is used to calculate 
a user's headroom during scheduling allocation. So, I think it is very 
important to keep these the same so that the preemption monitor won't preempt 
more than necessary. If these algorithms are not kept the same, preemption will 
preempt a container but the scheduler will decide to give that container right 
back to the same app.
{quote}this configuration should able to happen if we set user_limit_percent to 
50 and user_limit_factor to 1.0f, 3.0f, 3.0f and 2.0f respectively. But within 
current equation, this initial state won't happen.
{quote}
I don't think this is accurate. The minimum-user-limit-percent is part of the 
calculation in order to ensure that each user can get up to its minimum 
guarantee. This is to ensure that the user resource is not ever this: 
(queue_used/#active users) < (queue_used * minimum_user_limit_percent). But 
that is guaranteeing a minimum boundary per user, not capping any maximum 
boundary. So, the initial state can certainly happen for any number of reasons.
{quote}So the point is, we should let user-limit to reach at most 
queue_capacity * user_limit_factor
{quote}
I think that's one of the things {{UsersManager#computeUserLimit}} already 
does. At the heart of the headroom calculations is the algorithm in 
{{UsersManager#computeUserLimit}}. One of the things that this does is to 
ensure that a user's headroom stays below (guaranteed_capacity * 
user_limit_factor)
{quote}
| |*queue-a*|*queue-b*|*queue-c*|*queue-b|
|*Guaranteed*|30|30|30|10|
|*Used*|10|40|50|0|
|*Pending*|6|30|30|0|
{quote}
I don't think the updated use case documents a problems. I have reproduced the 
used case in a 7-node mini-cluster, and I have demonstrated that even when the 
queues are set up as described above with the apps having the described used 
and pending resources, the preemption monitor will preempt just the right 
amount and re-balance the queues as below:
| |*queue-a*|*queue-b*|*queue-c*|*queue-d*|
|*Guaranteed*|30|30|30|10|
|*Used*|16|42|42|0|
|*Pending*|0|28|38|0|
|*Preempted*|0|0|8|0|

This is because {{UsersManager#computeUserLimit}} leaves a buffer of 1 minimum 
container size.

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8509
>                 URL: https://issues.apache.org/jira/browse/YARN-8509
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Zian Chen
>            Assignee: Zian Chen
>            Priority: Major
>         Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to