[
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959698#comment-14959698
]
Wangda Tan commented on YARN-3769:
----------------------------------
[~eepayne], some quick comments:
- Why this is needed? {{MAX_PENDING_OVER_CAPACITY}}. I think this could be
problematic, for example, if a queue has capacity = 50, and it's usage is 10
and it has 55 pending resource, if we set MAX_PENDING_OVER_CAPACITY=0.1, the
queue cannot preempt resource from other queue.
- In LeafQueue, it uses getHeadroom() to compute how many resource that the
user can use. But I think it may not correct: getHeadroom is computed by
{code}
* Headroom is:
* min(
* min(userLimit, queueMaxCap) - userConsumed,
* queueMaxLimit - queueUsedResources
* )
{code}
(Please note the actual code is slightly different from the original comment,
it uses queue's MaxLimit instead of queue's Max resource)
One negative example is:
{code}
a (max=100, used=100, configured=100
a.a1 (max=100, used=30, configured=40)
a.a2 (max=100, used=70, configured=60)
{code}
For above queue status, headroom for a.a1 is 0 since queue-a's
{{currentResourceLimit}} is 0.
So instead of using headroom, I think you can use {{computed-user-limit -
user.usage(partition)}} as the headroom. You don't need to consider queue's max
capacity here, since we will consider queue's max capacity at following logic
of PCPP.
Thoughts?
> Preemption occurring unnecessarily because preemption doesn't consider user
> limit
> ---------------------------------------------------------------------------------
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 2.6.0, 2.7.0, 2.8.0
> Reporter: Eric Payne
> Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch,
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch,
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch,
> YARN-3769.003.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and
> then seeing the capacity scheduler giving them immediately back to queue A.
> This happens quite often and causes a lot of churn.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)