[
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988238#comment-14988238
]
Wangda Tan commented on YARN-3769:
----------------------------------
[~eepayne], Thanks for update.
bq. If you want, we can pull this out and put it as part of a different JIRA so
we can document and discuss that particular flapping situation separately.
I would prefer to make it to be a separate JIRA, since it is a not directly
related fix. Will review PCPP after you separate those changes (since you're OK
with making it separated :))
bq. Yes, you are correct. getHeadroom could be calculating zero headroom when
we don't want it to. And, I agree that we don't need to limit pending resources
to max queue capacity when calculating pending resources. The concern for this
fix is that user limit factor should be considered and limit the pending value.
The max queue capacity will be considered during the offer stage of the
preemption calculations.
I agree with your existing appoarch, user-limit should be capped by max queue
capacity as well.
One nit for LeafQueue changes:
{code}
1534 minPendingAndPreemptable =
1535 Resources.componentwiseMax(Resources.none(),
1536 Resources.subtract(
1537 userNameToHeadroom.get(userName),
minPendingAndPreemptable));
1538
{code}
you don't need to do componmentwiseMax here, since minPendingAndPreemptable <=
headroom, and you can use substractFrom to make code simpler.
> Preemption occurring unnecessarily because preemption doesn't consider user
> limit
> ---------------------------------------------------------------------------------
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 2.6.0, 2.7.0, 2.8.0
> Reporter: Eric Payne
> Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch,
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch,
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch,
> YARN-3769.003.patch, YARN-3769.004.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and
> then seeing the capacity scheduler giving them immediately back to queue A.
> This happens quite often and causes a lot of churn.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)