[
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735952#comment-14735952
]
Xianyin Xin commented on YARN-4120:
-----------------------------------
Hi [~kasha], there's another issue in the current preemption logic, it's in
{{FSParentQueue.java}} and {{FSLeafQueue.java}},
{code}
public RMContainer preemptContainer() {
RMContainer toBePreempted = null;
// Find the childQueue which is most over fair share
FSQueue candidateQueue = null;
Comparator<Schedulable> comparator = policy.getComparator();
readLock.lock();
try {
for (FSQueue queue : childQueues) {
if (candidateQueue == null ||
comparator.compare(queue, candidateQueue) > 0) {
candidateQueue = queue;
}
}
} finally {
readLock.unlock();
}
// Let the selected queue choose which of its container to preempt
if (candidateQueue != null) {
toBePreempted = candidateQueue.preemptContainer();
}
return toBePreempted;
}
{code}
{code}
public RMContainer preemptContainer() {
RMContainer toBePreempted = null;
// If this queue is not over its fair share, reject
if (!preemptContainerPreCheck()) {
return toBePreempted;
}
{code}
If the queue's hierarchy like that in the *Description*, suppose queue1 and
queue2 have the same weight, and the cluster has 8 containers, 4 occupied by
queue1.1 and 4 occupied by queue2. If new app was added in queue1.2, 2
containers should be preempted from queue1.1. However, according the above
code, queue1 and queue2 are both at their fairshare, so the preemption will not
happen.
So if all of the childqueues at any level are at their fairshare, preemption
will not happen even though there is/are resource deficit in some leafqueues.
I think we have to drop this logic in this case. As a candidate, we can
calculates an ideal preemption distribution by traversing the queues. Any
thoughts?
> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> ------------------------------------------------------------------------------
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
> return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my
> opinion, taking {{preemptedResource}} into account here is not reasonable,
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as
> preempted, it is currently used by app, and these resources will be
> subtracted from {{currentCosumption}} once the preemption is finished. it's
> not reasonable to make arrange for it ahead of time.
> # there's another problem here, consider following case,
> {code}
> root
> / \
> queue1 queue2
> / \
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4,
> the preemption happens in the interior of queue1. But when compute resource
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage -
> preemption}} according to the current code, which is unfair to queue2 when
> doing resource allocating.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)