[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Payne updated YARN-3769: ----------------------------- Attachment: YARN-3769.001.branch-2.8.patch YARN-3769.001.branch-2.7.patch {quote} One thing I've thought for a while is adding a "lazy preemption" mechanism, which is: when a container is marked preempted and wait for max_wait_before_time, it becomes a "can_be_killed" container. If there's another queue can allocate on a node with "can_be_killed" container, such container will be killed immediately to make room the new containers. I will upload a design doc shortly for review. {quote} [~leftnoteasy], because it's been a couple of months since the last activity on this JIRA, would it be better to use this JIRA for the purpose of making the preemption monitor "user-limit" aware and open a separate JIRA to address a redesign? Towards that end, I am uploading a couple of patches: - {{YARN-3769.001.branch-2.7.patch}} is a patch to 2.7 (and also 2.6) which we have been using internally. This fix has dramatically reduced the instances of "ping-pong"-ing as I outlined in [the comment above|https://issues.apache.org/jira/browse/YARN-3769?focusedCommentId=14573619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14573619]. - {{YARN-3769.001.branch-2.8.patch}} is similar to the fix made in 2.7, but it also takes into consideration node label partitions. Thanks for your help and please let me know what you think. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > --------------------------------------------------------------------------------- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.6.0, 2.7.0, 2.8.0 > Reporter: Eric Payne > Assignee: Wangda Tan > Attachments: YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)