[
https://issues.apache.org/jira/browse/YARN-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Yang updated YARN-9043:
---------------------------
Description:
To reproduce this problem in UT, we can setup a cluster with resource <40,18>
and create 3 queues and apps:
* queue a: guaranteed=<10,10>, used=<6,10> by app1
* queue b: guaranteed=<20,6>, used=<20,8> by app2
* queue c: guaranteed=<10,2>, used=<0,0>, pending=<1,1>
Queue c is an underserved queue, queue b overuses 2 cpu resource, we expect
app2 in queue b can be preempted but nothing happens.
This problem is related to Resources#greaterThan/lessThan, comparation between
two resources is based on the resource/cluster-resource ratio inside
DominantResourceCalculator#compare, in this way, the low weight resource may be
ignored, for the scenario in UT, take comparation between ideal assgined
resource and used resource:
* cluster resource is <40,18>
* ideal assigned resource of queue b is <20,6>, ideal-assigned-resource /
cluster-resource = <20, 6> / <40, 18> = max(20/40, 6/18) = 0.5
* used resource of queue b is <20, 8>, used-resource / cluster-resource = <20,
8> / <40, 18> = max(20/40, 8/18) = 0.5
The results of {{Resources.greaterThan(rc, clusterResource, used,
idealAssigned)}} will be false instead of true, and there are some other
similar places have the same problem, so that preemption can't happen in
current logic.
To solve this problem, I propose to add
ResourceCalculator#isAnyMajorResourceGreaterThan method, inside
DominantResourceCalculator implements, it will compare every resource type
between two resources and return true if any major resource types of left
resource is greater than that of right resource, then replace
Resources#greaterThan with it in some places of inter-queue preemption with
this problem.
Other places called Resources#greaterThan and other comparations in scheduler
and other preemption processes may encounter the same problem, perhaps need to
check through all resource comparation places in YARN, we need further discuss
about this.
was:
To reproduce this problem in UT, we can setup a cluster with resource <40,18>
and create 3 queues and apps:
* queue a: guaranteed=<10,10>, used=<6,10> by app1
* queue b: guaranteed=<20,6>, used=<20,8> by app2
* queue c: guaranteed=<10,2>, used=<0,0>, pending=<1,1>
Queue c is an underserved queue, queue b overuses 2 cpu resource, we expect
app2 in queue b can be preempted but nothing happens.
This problem is related to Resources#greaterThan/lessThan, comparation between
two resources is based on the resource/cluster-resource ratio inside
DominantResourceCalculator#compare, in this way, the low weight resource may be
ignored, for the scenario in UT, take comparation between ideal assgined
resource and used resource:
* cluster resource is <40,18>
* ideal assigned resource of queue b is <20,6>, ideal-assigned-resource /
cluster-resource = <20, 6> / <40, 18> = max(20/40, 6/18) = 0.5
* used resource of queue b is <20, 8>, used-resource / cluster-resource = <20,
8> / <40, 18> = max(20/40, 8/18) = 0.5
The results of {{Resources.greaterThan(rc, clusterResource, used,
idealAssigned)}} will be false instead of true, and there are some other
similar places have the same problem, so that preemption can't happen in
current logic.
To solve this problem, I propose to add
ResourceCalculator#isAnyMajorResourceGreaterThan method, inside
DominantResourceCalculator implements, it will compare every resource type
between two resources and return true if any major resource types of left
resource is greater than that of right resource, then replace
Resources#greaterThan with it in some places of inter-queue preemption with
this problem.
> Inter-queue preemption sometimes starves an underserved queue when using
> DominantResourceCalculator
> ---------------------------------------------------------------------------------------------------
>
> Key: YARN-9043
> URL: https://issues.apache.org/jira/browse/YARN-9043
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 3.3.0
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
>
> To reproduce this problem in UT, we can setup a cluster with resource <40,18>
> and create 3 queues and apps:
> * queue a: guaranteed=<10,10>, used=<6,10> by app1
> * queue b: guaranteed=<20,6>, used=<20,8> by app2
> * queue c: guaranteed=<10,2>, used=<0,0>, pending=<1,1>
> Queue c is an underserved queue, queue b overuses 2 cpu resource, we expect
> app2 in queue b can be preempted but nothing happens.
> This problem is related to Resources#greaterThan/lessThan, comparation
> between two resources is based on the resource/cluster-resource ratio inside
> DominantResourceCalculator#compare, in this way, the low weight resource may
> be ignored, for the scenario in UT, take comparation between ideal assgined
> resource and used resource:
> * cluster resource is <40,18>
> * ideal assigned resource of queue b is <20,6>, ideal-assigned-resource /
> cluster-resource = <20, 6> / <40, 18> = max(20/40, 6/18) = 0.5
> * used resource of queue b is <20, 8>, used-resource / cluster-resource =
> <20, 8> / <40, 18> = max(20/40, 8/18) = 0.5
> The results of {{Resources.greaterThan(rc, clusterResource, used,
> idealAssigned)}} will be false instead of true, and there are some other
> similar places have the same problem, so that preemption can't happen in
> current logic.
> To solve this problem, I propose to add
> ResourceCalculator#isAnyMajorResourceGreaterThan method, inside
> DominantResourceCalculator implements, it will compare every resource type
> between two resources and return true if any major resource types of left
> resource is greater than that of right resource, then replace
> Resources#greaterThan with it in some places of inter-queue preemption with
> this problem.
> Other places called Resources#greaterThan and other comparations in scheduler
> and other preemption processes may encounter the same problem, perhaps need
> to check through all resource comparation places in YARN, we need further
> discuss about this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]