Xianyin Xin commented on YARN-4120:

Hi [~kasha], there's another issue in the current preemption logic, it's in 
{{FSParentQueue.java}} and {{FSLeafQueue.java}},
  public RMContainer preemptContainer() {
    RMContainer toBePreempted = null;

    // Find the childQueue which is most over fair share
    FSQueue candidateQueue = null;
    Comparator<Schedulable> comparator = policy.getComparator();

    try {
      for (FSQueue queue : childQueues) {
        if (candidateQueue == null ||
            comparator.compare(queue, candidateQueue) > 0) {
          candidateQueue = queue;
    } finally {

    // Let the selected queue choose which of its container to preempt
    if (candidateQueue != null) {
      toBePreempted = candidateQueue.preemptContainer();
    return toBePreempted;
  public RMContainer preemptContainer() {
    RMContainer toBePreempted = null;

    // If this queue is not over its fair share, reject
    if (!preemptContainerPreCheck()) {
      return toBePreempted;
If the queue's hierarchy like that in the *Description*, suppose queue1 and 
queue2 have the same weight, and the cluster has 8 containers, 4 occupied by 
queue1.1 and 4 occupied by queue2. If new app was added in queue1.2, 2 
containers should be preempted from queue1.1. However, according the above 
code, queue1 and queue2 are both at their fairshare, so the preemption will not 

So if all of the childqueues at any level are at their fairshare, preemption 
will not happen even though there is/are resource deficit in some leafqueues.

I think we have to drop this logic in this case. As a candidate, we can 
calculates an ideal preemption distribution by traversing the queues. Any 

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> ------------------------------------------------------------------------------
>                 Key: YARN-4120
>                 URL: https://issues.apache.org/jira/browse/YARN-4120
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: Xianyin Xin
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
>             root
>            /    \
>       queue1   queue2
>       /    \
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.

This message was sent by Atlassian JIRA

Reply via email to