[
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405933#comment-16405933
]
kyungwan nam commented on YARN-8020:
------------------------------------
[~eepayne]
Sorry for the late response.
I've seen this problem in branch-2.8 and HDP-2.6.4.
Cluster
* Cluster total resources : <405 GB, 240 VCores>
* default Queue: 50%, 100% max capacity
* pri Queue: 50% capacity, 100% max capacity
* label1 Queue: 0% capacity, 0% max capacity
* there is ’label1’ non-exclusive node-label in my cluster. but, all nodes are
included in the default node-label.
capacity-scheduler
{code}
yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled=true
yarn.scheduler.capacity.reservations-continue-look-all-nodes=true
yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.accessible-node-labels.label1.capacity=100
yarn.scheduler.capacity.root.acl_administer_queue=
yarn.scheduler.capacity.root.acl_submit_applications=
yarn.scheduler.capacity.root.acl_submit_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.accessible-node-labels=
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-applications=100
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.default.priority=1
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=3
yarn.scheduler.capacity.root.label1.accessible-node-labels=label1
yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.capacity=100
yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.maximum-am-resource-percent=0.7
yarn.scheduler.capacity.root.label1.acl_submit_applications=*
yarn.scheduler.capacity.root.label1.capacity=0
yarn.scheduler.capacity.root.label1.default-node-label-expression=label1
yarn.scheduler.capacity.root.label1.maximum-am-resource-percent=0.7
yarn.scheduler.capacity.root.label1.maximum-applications=100
yarn.scheduler.capacity.root.label1.maximum-capacity=0
yarn.scheduler.capacity.root.label1.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.label1.priority=1
yarn.scheduler.capacity.root.label1.state=RUNNING
yarn.scheduler.capacity.root.label1.user-limit-factor=3
yarn.scheduler.capacity.root.ordering-policy=priority-utilization
yarn.scheduler.capacity.root.pri.accessible-node-labels=
yarn.scheduler.capacity.root.pri.acl_submit_applications=*
yarn.scheduler.capacity.root.pri.capacity=50
yarn.scheduler.capacity.root.pri.maximum-capacity=100
yarn.scheduler.capacity.root.pri.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.pri.priority=1
yarn.scheduler.capacity.root.pri.state=RUNNING
yarn.scheduler.capacity.root.pri.user-limit-factor=3
yarn.scheduler.capacity.root.queues=default,pri,label1
{code}
how to reproduce
* app1, which asking for <1GB, 1 VCore> AM container and 29 * <1GB, 8 VCores>
containers has been submitted to default Queue.
* after all containers for app1 have been allocated, submit app2, which asking
for <1GB, 1 VCore> AM container and 14 * <1GB, 8VCores> containers to pri queue
* as expected, some containers for app1 are preempted
{code:java}
2018-03-19 21:51:50,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) -
Trying to use
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector
to select preemption candidates
2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: pri CUR: <memory:1024, vCores:1> PEN: <memory:14336, vCores:112>
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 0.5
IDEAL_ASSIGNED: <memory:15360, vCores:113> IDEAL_PREEMPT: <memory:0, vCores:0>
ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0>
PREEMPTABLE: <memory:0, vCores:0>
2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: pri CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(209))
- Queue=default partition= resource-to-obtain=<memory:-83465, vCores:24>
2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: default CUR: <memory:30720, vCores:233> PEN: <memory:0, vCores:0>
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 1.0
IDEAL_ASSIGNED: <memory:399360, vCores:127> IDEAL_PREEMPT: <memory:-83465,
vCores:24> ACTUAL_PREEMPT: <memory:-83465, vCores:24> UNTOUCHABLE: <memory:0,
vCores:0> PREEMPTABLE: <memory:-176640, vCores:113>
2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: default CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:50,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:logToCSV(549)) - QUEUESTATE:
1521463910271, default, 30720, 233, 0, 0, 207360, 120, 399360, 127, -83465, 24,
-83465, 24, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 1024, 1, 14336,
112, 207360, 120, 15360, 113, 0, 0, 0, 0
2018-03-19 21:51:50,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(300))
- Starting to preempt containers for selectedCandidates and size:1{code}
* but, shortly after that, preemption does not happen no longer
{code:java}
2018-03-19 21:51:52,771 INFO rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(451)) - container_e49_1521339603918_0013_01_000006
Container Transitioned from ALLOCATED to ACQUIRED
2018-03-19 21:51:53,267 INFO rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(451)) - container_e49_1521339603918_0013_01_000006
Container Transitioned from ACQUIRED to RUNNING
2018-03-19 21:51:53,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) -
Trying to use
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector
to select preemption candidates
2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: pri CUR: <memory:4096, vCores:25> PEN: <memory:11264, vCores:88>
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 0.5
IDEAL_ASSIGNED: <memory:15360, vCores:113> IDEAL_PREEMPT: <memory:0, vCores:0>
ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0>
PREEMPTABLE: <memory:0, vCores:0>
2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: pri CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: default CUR: <memory:27648, vCores:209> PEN: <memory:3072, vCores:1>
RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 1.0
IDEAL_ASSIGNED: <memory:30720, vCores:210> IDEAL_PREEMPT: <memory:0, vCores:0>
ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0>
PREEMPTABLE: <memory:-179712, vCores:89>
2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
- NAME: default CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED:
<memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED:
<memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT:
<memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0,
vCores:0>
2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:logToCSV(549)) - QUEUESTATE:
1521463913271, default, 27648, 209, 3072, 1, 207360, 120, 30720, 210, 0, 0, 0,
0, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 4096, 25, 11264, 88,
207360, 120, 15360, 113, 0, 0, 0, 0
2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(300))
- Starting to preempt containers for selectedCandidates and size:0
2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy
(ProportionalCapacityPreemptionPolicy.java:editSchedule(293)) - Total time
used=1 ms.{code}
> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> ----------------------------------------------------------------------------
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: kyungwan nam
> Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
> Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
> Resources.subtract(getMax(), idealAssigned),
> Resource.newInstance(0, 0));
> // accepted = min{avail,
> // max - assigned,
> // current + pending - assigned,
> // # Make sure a queue will not get more than max of its
> // # used/guaranteed, this is to make sure preemption won't
> // # happen if all active queues are beyond their guaranteed
> // # This is for leaf queue only.
> // max(guaranteed, used) - assigned}
> // remain = avail - accepted
> Resource accepted = Resources.min(rc, clusterResource,
> absMaxCapIdealAssignedDelta,
> Resources.min(rc, clusterResource, avail, Resources
> /*
> * When we're using FifoPreemptionSelector (considerReservedResource
> * = false).
> *
> * We should deduct reserved resource from pending to avoid
> excessive
> * preemption:
> *
> * For example, if an under-utilized queue has used = reserved = 20.
> * Preemption policy will try to preempt 20 containers (which is not
> * satisfied) from different hosts.
> *
> * In FifoPreemptionSelector, there's no guarantee that preempted
> * resource can be used by pending request, so policy will preempt
> * resources repeatly.
> */
> .subtract(Resources.add(getUsed(),
> (considersReservedResource ? pending : pendingDeductReserved)),
> idealAssigned)));
> {code}
> let’s say,
> * cluster resource : <Memory:200GB, VCores:20>
> * idealAssigned(assigned): <Memory:100GB, VCores:10>
> * avail: <Memory:181GB, Vcores:1>
> * current: <Memory:19GB, Vcores:19>
> * pending: <Memory:0, Vcores:0>
> current + pending - assigned: <Memory:-181GB, Vcores:9>
> min ( avail, (current + pending - assigned) ) : <Memory:-181GB, Vcores:9>
> accepted: <Memory:-181GB, Vcores:9>
> as a result, idealAssigned will be <Memory:-81GB, VCores:19>, which does not
> trigger preemption.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]