[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497037#comment-15497037 ]
Eric Payne commented on YARN-4945: ---------------------------------- [~leftnoteasy] bq. Have a Map of username to headroom inside the method can compute user limit at most once for different user. And this logic can be reused to compute user-limit preemption Maybe we are talking about the same thing, but I just want to clarify that I am not advocating preemption based on headroom (user-limit-factor). I am advocating based on minimum user limit percent (MULP), which is the minimum guaranteed resource amount per user per queue. {quote} To be honest, I haven't thought a good way that a list of policies can better solve the priority + user-limit preemption problem. Could you share some ideas about it. For example, how to better consider both in the final decision {quote} I believe that the two preemption policies (priority and minimum-user-limit-percent) are _mostly_ (but not completely) separate. I would say that priority preemption only considers apps from the same user, and MULP preemption only considers apps from different users. If you look at the behavior of the capacity scheduler, I was surprised to find that it mostly ignores priority when assigning resources between apps of different users. I conducted the following experiment, without turning on preemption: # The cluster has only 1 queue, and it takes up all of the resources. ||Queue Name||Total Containers||{{user-limit-factor}}||{{minimum-user-limit-percent}}||Priority Enabled|| |default|24|1.0 (each user can take up the whole queue if no other users are present)|0.25 (if other users are present, each user is guaranteed at least 25% of the queue's resources; at max, 4 users can have apps in the queue at once; if less than 4 users, the scheduler tries to balance resources evenly between users)|false| # {{user1}} starts {{app1}} in {{default}} of {{priority1}} and consumes all resources # {{user2}} starts {{app2}} in {{default}} of {{priority2}}. ||User Name||App Name||App Priority||Used Containers||Pending Containers|| |user1|app1|1|24|76| |user2|app2|2|0|100| # I kill 12 containers from {{app1}} and the capacity scheduler assigns them to {{app2}}. Not because {{app2}} has a higher priority than {{app1}}, but because {{user2}} is using less resources than {{user1}} (the capacity scheduler tries to balance resources between users). ||User Name||App Name||App Priority||Used Containers||Pending Containers|| |user1|app1|1|12|76| |user2|app2|2|12|76| # At this point, what should happen if I kill another container from {{app1}}? Since {{app2}} is higher priority than {{app1}}, and since MULP is 25% (so {{user2}}'s minimum guarantee is only 6), you might think that the capacity scheduler will give it to {{app2}} (that's what I thought it would do). _But it doesn't!_ The capacity scheduler gives the container back to {{app1}} because it wants to balance the resources between all users. And the table remains the same: ||User Name||App Name||App Priority||Used Containers||Pending Containers|| |user1|app1|1|12|76| |user2|app2|2|12|76| Once the users are balanced, no matter how many times I kill a container from {{app1}}, it always goes back to {{app1}}. From a priority perspective, this could be considered an inversion, since {{app2}} is asking for more resources and {{app1}} is well above its MULP. But the capacity scheduler does not consider priority in this case. If I try the same experiment, but with both apps owned by {{user1}}, then I can kill all of {{app1}}'s containers (except the AM) and they all get assigned to {{app2}} Because the capacity scheduler behaves this way, I would recommend that the MULP preemption policy run first and try to balance each user's ideal assigned. The MULP policy would preempt from lowest priority first, so it would consider priority of apps owned by other, over-served users when deciding what to preempt, and consider priority of apps owned by the current, under-served user, when deciding ideal-assigned values. Then, I would run the priority policy but only consider apps within each user. As shown above, once the users are balanced between each other with regard to MULP, trying to kill containers from higher priority apps of other users will only cause preemption churn. [~leftnoteasy], as you said, we may be able to combine the two into one policy, and you may be right that this can be done without being too complicated. The thing I want to ensure is that the priority preemption policy doesn't try to kill high priority containers from different users that will only be reassigned back to the original user and cause preemption churn. > [Umbrella] Capacity Scheduler Preemption Within a queue > ------------------------------------------------------- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org