Tao Yang created YARN-9043:
------------------------------

             Summary: Inter-queue preemption sometimes starves an underserved 
queue when using DominantResourceCalculator
                 Key: YARN-9043
                 URL: https://issues.apache.org/jira/browse/YARN-9043
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 3.3.0
            Reporter: Tao Yang
            Assignee: Tao Yang


To reproduce this problem in UT, we can setup a cluster with resource <40,18> 
and create 3 queues and apps:
 * queue a: guaranteed=<10,10>, used=<6,10> by app1
 * queue b: guaranteed=<20,6>, used=<20,8> by app2
 * queue c: guaranteed=<10,2>, used=<0,0>, pending=<1,1>

Queue c is an underserved queue, queue b overuses 2 cpu resource, we expect 
app2 in queue b can be preempted but nothing happens.

This problem is related to Resources#greaterThan/lessThan, comparation between 
two resources is based on the resource/cluster-resource ratio inside 
DominantResourceCalculator#compare, in this way, the low weight resource may be 
ignored, for the scenario in UT, take comparation between ideal assgined 
resource and used resource:
 * cluster resource is <40,18>
 * ideal assigned resource of queue b is <20,6>, ideal-assigned-resource / 
cluster-resource = <20, 6> / <40, 18> = max(20/40, 6/18) = 0.5
 * used resource of queue b is <20, 8>, used-resource / cluster-resource = <20, 
8> / <40, 18> = max(20/40, 8/18) = 0.5

The results of {{Resources.greaterThan(rc, clusterResource, used, 
idealAssigned)}} will be false instead of true, and there are some other 
similar places have the same problem, so that preemption can't happen in 
current logic.

To solve this problem, I propose to add 
ResourceCalculator#isAnyMajorResourceGreaterThan method, inside 
DominantResourceCalculator implements, it will compare every resource type 
between two resources and return true if any major resource types of left 
resource is greater than that of right resource, then replace 
Resources#greaterThan with it in some places of inter-queue preemption with 
this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to