[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959698#comment-14959698
 ] 

Wangda Tan commented on YARN-3769:
----------------------------------

[~eepayne], some quick comments:
- Why this is needed? {{MAX_PENDING_OVER_CAPACITY}}. I think this could be 
problematic, for example, if a queue has capacity = 50, and it's usage is 10 
and it has 55 pending resource, if we set MAX_PENDING_OVER_CAPACITY=0.1, the 
queue cannot preempt resource from other queue.
- In LeafQueue, it uses getHeadroom() to compute how many resource that the 
user can use. But I think it may not correct:  getHeadroom is computed by 
{code}
     * Headroom is:
     *    min(
     *        min(userLimit, queueMaxCap) - userConsumed,
     *        queueMaxLimit - queueUsedResources
     *       )
{code}
(Please note the actual code is slightly different from the original comment, 
it uses queue's MaxLimit instead of queue's Max resource)
One negative example is:
{code}
a  (max=100, used=100, configured=100
a.a1 (max=100, used=30, configured=40)
a.a2 (max=100, used=70, configured=60)
{code}
For above queue status, headroom for a.a1 is 0 since queue-a's 
{{currentResourceLimit}} is 0.
So instead of using headroom, I think you can use {{computed-user-limit - 
user.usage(partition)}} as the headroom. You don't need to consider queue's max 
capacity here, since we will consider queue's max capacity at following logic 
of PCPP.

Thoughts?

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-3769
>                 URL: https://issues.apache.org/jira/browse/YARN-3769
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>         Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, 
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch, 
> YARN-3769.003.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to