[
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Payne updated YARN-3769:
-----------------------------
Attachment: YARN-3769-branch-2.7.006.patch
[~leftnoteasy], thanks for your comments.
{quote}
The problem is getUserResourceLimit is not always updated by scheduler. If a
queue is not traversed by scheduler OR apps of a queue-user have long heartbeat
interval, the user resource limit could be staled.
{quote}
Got it
{quote}
I found 0005 patch for trunk is computing user-limit every time and 0005 patch
for 2.7 is using getUserResourceLimit.
{quote}
Yes, I was concerned about using the 2.7 version of {{computeUserLimit}}. It is
different than the branch-2 and trunk versions, and it expects a {{required}}
parameter which, in 2.7, is calculated in {{assignContainers}} based on an
app's capability requests for a given container priority. I noticed that in
branch-2 and trunk, it looks like this {{required}} parameter is just given the
value of {{minimumAllocation}}.
So, in {{YARN-3769-branch-2.7.006.patch}} I passed {{minimumAllocation}} in the
{{required}} parameter of {{computeUserLimit}}.
> Preemption occurring unnecessarily because preemption doesn't consider user
> limit
> ---------------------------------------------------------------------------------
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 2.6.0, 2.7.0, 2.8.0
> Reporter: Eric Payne
> Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch,
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch,
> YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch,
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch,
> YARN-3769.003.patch, YARN-3769.004.patch, YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and
> then seeing the capacity scheduler giving them immediately back to queue A.
> This happens quite often and causes a lot of churn.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)