[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405933#comment-16405933
 ] 

kyungwan nam commented on YARN-8020:
------------------------------------

[~eepayne]

Sorry for the late response.

I've seen this problem in branch-2.8 and HDP-2.6.4.

Cluster
 * Cluster total resources : <405 GB, 240 VCores>
 * default Queue: 50%, 100% max capacity
 * pri Queue: 50% capacity, 100% max capacity
 * label1 Queue: 0% capacity, 0% max capacity
 * there is ’label1’ non-exclusive node-label in my cluster. but, all nodes are 
included in the default node-label.

capacity-scheduler 
{code}
yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled=true
yarn.scheduler.capacity.reservations-continue-look-all-nodes=true
yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.accessible-node-labels.label1.capacity=100
yarn.scheduler.capacity.root.acl_administer_queue= 
yarn.scheduler.capacity.root.acl_submit_applications= 
yarn.scheduler.capacity.root.acl_submit_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.accessible-node-labels= 
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-applications=100
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.default.priority=1
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=3
yarn.scheduler.capacity.root.label1.accessible-node-labels=label1
yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.capacity=100
yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.maximum-am-resource-percent=0.7
yarn.scheduler.capacity.root.label1.acl_submit_applications=*
yarn.scheduler.capacity.root.label1.capacity=0
yarn.scheduler.capacity.root.label1.default-node-label-expression=label1
yarn.scheduler.capacity.root.label1.maximum-am-resource-percent=0.7
yarn.scheduler.capacity.root.label1.maximum-applications=100
yarn.scheduler.capacity.root.label1.maximum-capacity=0
yarn.scheduler.capacity.root.label1.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.label1.priority=1
yarn.scheduler.capacity.root.label1.state=RUNNING
yarn.scheduler.capacity.root.label1.user-limit-factor=3
yarn.scheduler.capacity.root.ordering-policy=priority-utilization
yarn.scheduler.capacity.root.pri.accessible-node-labels= 
yarn.scheduler.capacity.root.pri.acl_submit_applications=*
yarn.scheduler.capacity.root.pri.capacity=50
yarn.scheduler.capacity.root.pri.maximum-capacity=100
yarn.scheduler.capacity.root.pri.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.pri.priority=1
yarn.scheduler.capacity.root.pri.state=RUNNING
yarn.scheduler.capacity.root.pri.user-limit-factor=3
yarn.scheduler.capacity.root.queues=default,pri,label1
{code}


how to reproduce
 * app1, which asking for <1GB, 1 VCore> AM container and 29 * <1GB, 8 VCores> 
containers has been submitted to default Queue.
 * after all containers for app1 have been allocated, submit app2, which asking 
for <1GB, 1 VCore> AM container and 14 * <1GB, 8VCores> containers to pri queue
 * as expected, some containers for app1 are preempted 

{code:java}
2018-03-19 21:51:50,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) - 
Trying to use 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector
 to select preemption candidates

2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: pri CUR: <memory:1024, vCores:1> PEN: <memory:14336, vCores:112> 
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 0.5 
IDEAL_ASSIGNED: <memory:15360, vCores:113> IDEAL_PREEMPT: <memory:0, vCores:0> 
ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> 
PREEMPTABLE: <memory:0, vCores:0>



2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: pri CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(209))
 - Queue=default partition= resource-to-obtain=<memory:-83465, vCores:24>

2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: default CUR: <memory:30720, vCores:233> PEN: <memory:0, vCores:0> 
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 1.0 
IDEAL_ASSIGNED: <memory:399360, vCores:127> IDEAL_PREEMPT: <memory:-83465, 
vCores:24> ACTUAL_PREEMPT: <memory:-83465, vCores:24> UNTOUCHABLE: <memory:0, 
vCores:0> PREEMPTABLE: <memory:-176640, vCores:113>



2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: default CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:50,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:logToCSV(549)) -  QUEUESTATE: 
1521463910271, default, 30720, 233, 0, 0, 207360, 120, 399360, 127, -83465, 24, 
-83465, 24, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 1024, 1, 14336, 
112, 207360, 120, 15360, 113, 0, 0, 0, 0

2018-03-19 21:51:50,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(300))
 - Starting to preempt containers for selectedCandidates and size:1{code}

 * but, shortly after that, preemption does not happen no longer

{code:java}
2018-03-19 21:51:52,771 INFO  rmcontainer.RMContainerImpl 
(RMContainerImpl.java:handle(451)) - container_e49_1521339603918_0013_01_000006 
Container Transitioned from ALLOCATED to ACQUIRED

2018-03-19 21:51:53,267 INFO  rmcontainer.RMContainerImpl 
(RMContainerImpl.java:handle(451)) - container_e49_1521339603918_0013_01_000006 
Container Transitioned from ACQUIRED to RUNNING

2018-03-19 21:51:53,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) - 
Trying to use 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector
 to select preemption candidates

2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: pri CUR: <memory:4096, vCores:25> PEN: <memory:11264, vCores:88> 
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 0.5 
IDEAL_ASSIGNED: <memory:15360, vCores:113> IDEAL_PREEMPT: <memory:0, vCores:0> 
ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> 
PREEMPTABLE: <memory:0, vCores:0>



2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: pri CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: default CUR: <memory:27648, vCores:209> PEN: <memory:3072, vCores:1> 
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 1.0 
IDEAL_ASSIGNED: <memory:30720, vCores:210> IDEAL_PREEMPT: <memory:0, vCores:0> 
ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> 
PREEMPTABLE: <memory:-179712, vCores:89>



2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: default CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: 
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: 
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: 
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, 
vCores:0>



2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:logToCSV(549)) -  QUEUESTATE: 
1521463913271, default, 27648, 209, 3072, 1, 207360, 120, 30720, 210, 0, 0, 0, 
0, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 4096, 25, 11264, 88, 
207360, 120, 15360, 113, 0, 0, 0, 0

2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(300))
 - Starting to preempt containers for selectedCandidates and size:0

2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:editSchedule(293)) - Total time 
used=1 ms.{code}
 

 

 

 

 

 

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> ----------------------------------------------------------------------------
>
>                 Key: YARN-8020
>                 URL: https://issues.apache.org/jira/browse/YARN-8020
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: kyungwan nam
>            Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
>     Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>       Resources.subtract(getMax(), idealAssigned),
>       Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //               max - assigned,
>   //               current + pending - assigned,
>   //               # Make sure a queue will not get more than max of its
>   //               # used/guaranteed, this is to make sure preemption won't
>   //               # happen if all active queues are beyond their guaranteed
>   //               # This is for leaf queue only.
>   //               max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>       absMaxCapIdealAssignedDelta,
>       Resources.min(rc, clusterResource, avail, Resources
>           /*
>            * When we're using FifoPreemptionSelector (considerReservedResource
>            * = false).
>            *
>            * We should deduct reserved resource from pending to avoid 
> excessive
>            * preemption:
>            *
>            * For example, if an under-utilized queue has used = reserved = 20.
>            * Preemption policy will try to preempt 20 containers (which is not
>            * satisfied) from different hosts.
>            *
>            * In FifoPreemptionSelector, there's no guarantee that preempted
>            * resource can be used by pending request, so policy will preempt
>            * resources repeatly.
>            */
>           .subtract(Resources.add(getUsed(),
>               (considersReservedResource ? pending : pendingDeductReserved)),
>               idealAssigned)));
> {code}
> let’s say,
> * cluster resource : <Memory:200GB, VCores:20>
> * idealAssigned(assigned): <Memory:100GB, VCores:10>
> * avail: <Memory:181GB, Vcores:1>
> * current: <Memory:19GB, Vcores:19>
> * pending: <Memory:0, Vcores:0>
> current + pending - assigned: <Memory:-181GB, Vcores:9>
> min ( avail, (current + pending - assigned) ) : <Memory:-181GB, Vcores:9>
> accepted: <Memory:-181GB, Vcores:9>
> as a result, idealAssigned will be <Memory:-81GB, VCores:19>, which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to