[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578867#comment-16578867 ]
Eric Payne commented on YARN-8509: ---------------------------------- [~Zian Chen], thanks for your reply. Let's take a step back. {{LeafQueue#getTotalPendingResourcesConsideringUserLimit}} is eventually calling {{UsersManager#computeUserLimit}} to determine each user's headroom during preemption processing. This is the same thing that is used to calculate a user's headroom during scheduling allocation. So, I think it is very important to keep these the same so that the preemption monitor won't preempt more than necessary. If these algorithms are not kept the same, preemption will preempt a container but the scheduler will decide to give that container right back to the same app. {quote}this configuration should able to happen if we set user_limit_percent to 50 and user_limit_factor to 1.0f, 3.0f, 3.0f and 2.0f respectively. But within current equation, this initial state won't happen. {quote} I don't think this is accurate. The minimum-user-limit-percent is part of the calculation in order to ensure that each user can get up to its minimum guarantee. This is to ensure that the user resource is not ever this: (queue_used/#active users) < (queue_used * minimum_user_limit_percent). But that is guaranteeing a minimum boundary per user, not capping any maximum boundary. So, the initial state can certainly happen for any number of reasons. {quote}So the point is, we should let user-limit to reach at most queue_capacity * user_limit_factor {quote} I think that's one of the things {{UsersManager#computeUserLimit}} already does. At the heart of the headroom calculations is the algorithm in {{UsersManager#computeUserLimit}}. One of the things that this does is to ensure that a user's headroom stays below (guaranteed_capacity * user_limit_factor) {quote} | |*queue-a*|*queue-b*|*queue-c*|*queue-b| |*Guaranteed*|30|30|30|10| |*Used*|10|40|50|0| |*Pending*|6|30|30|0| {quote} I don't think the updated use case documents a problems. I have reproduced the used case in a 7-node mini-cluster, and I have demonstrated that even when the queues are set up as described above with the apps having the described used and pending resources, the preemption monitor will preempt just the right amount and re-balance the queues as below: | |*queue-a*|*queue-b*|*queue-c*|*queue-d*| |*Guaranteed*|30|30|30|10| |*Used*|16|42|42|0| |*Pending*|0|28|38|0| |*Preempted*|0|0|8|0| This is because {{UsersManager#computeUserLimit}} leaves a buffer of 1 minimum container size. > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > ------------------------------------------------------------------------------------------------------------------- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Zian Chen > Assignee: Zian Chen > Priority: Major > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org