[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445528#comment-16445528 ] kyungwan nam commented on YARN-8020: [~eepayne] I don’t think this is the same as YARN-8179. in case of YARN-8179, to-be-preempted resources are calculated correctly. the problem happens when applying natural_termination_factor. but, in this issue, idealAssigned resources are not correctly calculated. as a result to-be-preempted resources are not correct. > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : > * idealAssigned(assigned): > * avail: > * current: > * pending: > current + pending - assigned: > min ( avail, (current + pending - assigned) ) : > accepted: > as a result, idealAssigned will be , which does not > trigger preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444127#comment-16444127 ] Eric Payne commented on YARN-8020: -- [~kyungwan nam], is this issue the same as YARN-8179? > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : > * idealAssigned(assigned): > * avail: > * current: > * pending: > current + pending - assigned: > min ( avail, (current + pending - assigned) ) : > accepted: > as a result, idealAssigned will be , which does not > trigger preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412106#comment-16412106 ] Wangda Tan commented on YARN-8020: -- Thanks [~eepayne]/[~kyungwan nam] for comments. Haven't checked much detailed of above comments. I believe we have some issues in existing DRF preemption logic. I plan to spend some time to add unit tests to YARN-8004 in the next several weeks. > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : > * idealAssigned(assigned): > * avail: > * current: > * pending: > current + pending - assigned: > min ( avail, (current + pending - assigned) ) : > accepted: > as a result, idealAssigned will be , which does not > trigger preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407717#comment-16407717 ] kyungwan nam commented on YARN-8020: I'm thinking the reason why it happens is as follows. {code:java} // assign all cluster resources until no more demand, or no resources are // left while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant, unassigned, Resources.none())) { Resource wQassigned = Resource.newInstance(0, 0); // we compute normalizedGuarantees capacity based on currently active // queues resetCapacity(unassigned, orderedByNeed, ignoreGuarantee); // For each underserved queue (or set of queues if multiple are equally // underserved), offer its share of the unassigned resources based on its // normalized guarantee. After the offer, if the queue is not satisfied, // place it back in the ordered list of queues, recalculating its place // in the order of most under-guaranteed to most over-guaranteed. In this // way, the most underserved queue(s) are always given resources first. Collection underserved = getMostUnderservedQueues( orderedByNeed, tqComparator); for (Iterator i = underserved.iterator(); i .hasNext();) { TempQueuePerPartition sub = i.next(); Resource wQavail = Resources.multiplyAndNormalizeUp(rc, unassigned, sub.normalizedGuarantee, Resource.newInstance(1, 1)); Resource wQidle = sub.offer(wQavail, rc, totGuarant, isReservedPreemptionCandidatesSelector); Resource wQdone = Resources.subtract(wQavail, wQidle); if (Resources.greaterThan(rc, totGuarant, wQdone, Resources.none())) { // The queue is still asking for more. Put it back in the priority // queue, recalculating its order based on need. orderedByNeed.add(sub); } Resources.addTo(wQassigned, wQdone); } Resources.subtractFrom(unassigned, wQassigned); } {code} {quote}default, 27648, 209, 3072, 1, 207360, 120, 30720, 210, 0, 0, 0, 0, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 4096, 25, 11264, 88, 207360, 120, 15360, 113, 0, 0, 0, 0 {quote} 'unassigned' would be assigned in the most underserved order. therefore, most vcores of 'unassigned' have been allocated to pri queue. therefore, when offer() is called for default queue, 'unassinged' would be a large memory and a few vcores. let’s assume, 'avail' <20, 7> normally, in this case, min(avail, (current + pending - assigned) ) should be ‘avail’. because, available vcores are not enough. but, it was (current + pending - assigned) due to memory. min ( <20, 7>, ( <27648, 209> + <3072, 1> - <207360, 120> ) ) min ( <20, 7>, <-176640, 90> ) = <-176640, 90> as a result, idealAssigned for default queue is <-176640, 90> + <207360, 120> = <30720, 210> > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405933#comment-16405933 ] kyungwan nam commented on YARN-8020: [~eepayne] Sorry for the late response. I've seen this problem in branch-2.8 and HDP-2.6.4. Cluster * Cluster total resources : <405 GB, 240 VCores> * default Queue: 50%, 100% max capacity * pri Queue: 50% capacity, 100% max capacity * label1 Queue: 0% capacity, 0% max capacity * there is ’label1’ non-exclusive node-label in my cluster. but, all nodes are included in the default node-label. capacity-scheduler {code} yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled=true yarn.scheduler.capacity.reservations-continue-look-all-nodes=true yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator yarn.scheduler.capacity.root.accessible-node-labels.label1.capacity=100 yarn.scheduler.capacity.root.acl_administer_queue= yarn.scheduler.capacity.root.acl_submit_applications= yarn.scheduler.capacity.root.acl_submit_queue=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.accessible-node-labels= yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=50 yarn.scheduler.capacity.root.default.maximum-applications=100 yarn.scheduler.capacity.root.default.maximum-capacity=100 yarn.scheduler.capacity.root.default.minimum-user-limit-percent=50 yarn.scheduler.capacity.root.default.priority=1 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=3 yarn.scheduler.capacity.root.label1.accessible-node-labels=label1 yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.capacity=100 yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.maximum-am-resource-percent=0.7 yarn.scheduler.capacity.root.label1.acl_submit_applications=* yarn.scheduler.capacity.root.label1.capacity=0 yarn.scheduler.capacity.root.label1.default-node-label-expression=label1 yarn.scheduler.capacity.root.label1.maximum-am-resource-percent=0.7 yarn.scheduler.capacity.root.label1.maximum-applications=100 yarn.scheduler.capacity.root.label1.maximum-capacity=0 yarn.scheduler.capacity.root.label1.minimum-user-limit-percent=50 yarn.scheduler.capacity.root.label1.priority=1 yarn.scheduler.capacity.root.label1.state=RUNNING yarn.scheduler.capacity.root.label1.user-limit-factor=3 yarn.scheduler.capacity.root.ordering-policy=priority-utilization yarn.scheduler.capacity.root.pri.accessible-node-labels= yarn.scheduler.capacity.root.pri.acl_submit_applications=* yarn.scheduler.capacity.root.pri.capacity=50 yarn.scheduler.capacity.root.pri.maximum-capacity=100 yarn.scheduler.capacity.root.pri.minimum-user-limit-percent=50 yarn.scheduler.capacity.root.pri.priority=1 yarn.scheduler.capacity.root.pri.state=RUNNING yarn.scheduler.capacity.root.pri.user-limit-factor=3 yarn.scheduler.capacity.root.queues=default,pri,label1 {code} how to reproduce * app1, which asking for <1GB, 1 VCore> AM container and 29 * <1GB, 8 VCores> containers has been submitted to default Queue. * after all containers for app1 have been allocated, submit app2, which asking for <1GB, 1 VCore> AM container and 14 * <1GB, 8VCores> containers to pri queue * as expected, some containers for app1 are preempted {code:java} 2018-03-19 21:51:50,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) - Trying to use org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector to select preemption candidates 2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: label1 CUR: PEN: RESERVED: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: UNTOUCHABLE: PREEMPTABLE: 2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: label1 CUR: PEN: RESERVED: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: UNTOUCHABLE: PREEMPTABLE: 2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: pri CUR: PEN: RESERVED: GAR: NORM: 0.5 IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: UNTOUCHABLE: PREEMPTABLE: 2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: pri CUR: PEN: RESERVED: GAR: NORM: NaN IDEAL_ASSIGNED: IDEAL_PREEMPT: ACTUAL_PREEMPT: UNTOUCHABLE: PREEMPTABLE: 2018-03-19 21:51:50,271 DEBUG
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405166#comment-16405166 ] Eric Payne commented on YARN-8020: -- [~leftnoteasy], sorry for the delay. {quote}explain why preemption doesn't happen for the case you mentioned {quote} As it turns out, the corner case I'm running into is not related to DRF. It has the same behavior with the default resource calculator. The use case is this: - QueueA is preemptable and is running App1 which is consuming the entire cluster - App2 is submitted to QueueB with container requests where each container is larger then the user limit for QueueB. In this case, preemption will not occur. DETAILS: - Cluster size: 20G - Cluster Min container size: 1G - QueueA capacity: 10G - QueueB capacity: 10G - QueueB MULP: 10% ACTIONS: - App1 running in QueueA consumes 20G - App2 is submitted to QueueB with AM size 1G and map container sizes 4G. - App2's Max Resource is 1G when it is requesting the AM container (10% of 10G == 1G). The preemption monitor sees that the pending request is 1G and that App2's headroom is 1G, so it preempts 1G from App1 in QueueA. - The Capacity Scheduler assigns 1G to App2 in QueueB. App2 begins running the AM container. - App2 requests several map containers at 4G each. App2's Max Resource is computed to be 2G. (((active user's used resources/# active users) + min container size) == (1G/1 + 1G) == 2G)). This leaves 1G of headroom for App2. - The preemption monitor sees that the requested container size for App2 is 4G which is larger than the 1G headroom, so the preemption monitor does not preempt. Technically, this behavior is slightly out of sync with the way the capacity scheduler assigns containers. As long as the headroom for an app is 0 or more, the capacity scheduler will assign one more container, no matter how big the container is, so the preemption monitor should go ahead and preempt in this case. I'm not sure I want it to, though, because it's better to be conservative than to preempt when it should not. [~kyungwan nam], on what version of YARN are you seeing this problem? I am not seeing any DRF-related issues in 2.8 or 3.x. > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : > * idealAssigned(assigned): > * avail: > * current: > * pending: > current + pending - assigned: > min ( avail, (current + pending - assigned) ) : > accepted: > a
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398575#comment-16398575 ] Eric Payne commented on YARN-8020: -- bq. explain why preemption doesn't happen for the case you mentioned I don't know why yet. I'm still investigating. > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : > * idealAssigned(assigned): > * avail: > * current: > * pending: > current + pending - assigned: > min ( avail, (current + pending - assigned) ) : > accepted: > as a result, idealAssigned will be , which does not > trigger preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398088#comment-16398088 ] Wangda Tan commented on YARN-8020: -- [~eepayne], could you explain why preemption doesn't happen for the case you mentioned: bq. The place where it seems to get stuck is when the containers in the preemptable queue are using one or more smaller Resource elements than the containers in the asking queue. For example, it will sometimes not preempt if the preemptable queue has containers using and the asking queue queue has containers using . [~sunilg] mentioned one case before: YARN-6538 which also causes preemption not happening. > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : > * idealAssigned(assigned): > * avail: > * current: > * pending: > current + pending - assigned: > min ( avail, (current + pending - assigned) ) : > accepted: > as a result, idealAssigned will be , which does not > trigger preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned
[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397262#comment-16397262 ] Eric Payne commented on YARN-8020: -- [~kyungwan nam], on what version of YARN are you seeing this problem? My experience with DRF is different than is described above. I have investigated this on both 2.8 and 3.2 snapshot builds. We are using the DRF calculator in large preemptable queues with various sizes of containers using both large memory or large vcores or both. Cross-queue preemption seems to be working well in general. I do see a corner case, but first I want to address your above comments. bq. as a result, idealAssigned will be , which does not trigger preemption. If one of the elements in the idealAssigned Resource is 0 or less than 0, preemption will not occur. This is so that preemption won't bring the queue too far below its guarantee for one of the elements. Having said that, it will preempt to a large extent even if it brings one of the elements below its guarantee, but if one of them goes to 0 or below in the idealAssigned Resource, it will stop preempting. bq. avail: Cross-queue preemption will not preempt if there are available resources in the cluster or queue. It depends on how many resources are being requested by the other queue, but even with 1 available vcore, preemption may choose not to preempt in this case as well. Now on to my corner case. I do not see a problem using DRF if the containers in the preemptable queue have a larger Resource element and the containers in the asking queue have smaller Resource elements. For example, it seems to work fine if Resources in the preemptable queue is using containers and the asking queue is using smaller containers, for example containers. The place where it seems to get stuck is when the containers in the preemptable queue are using one or more smaller Resource elements than the containers in the asking queue. For example, it will sometimes not preempt if the preemptable queue has containers using and the asking queue queue has containers using . Even in the latter case, preemption will sometimes still occur, depending on the ratio of the sizes of each element to the ones in the ohter queue. It would be helpful if you can provide a more detailed use case to describe exactly what you are seeing so I can try to reproduce it. > when DRF is used, preemption does not trigger due to incorrect idealAssigned > > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* >* When we're using FifoPreemptionSelector (considerReservedResource >* = false). >* >* We should deduct reserved resource from pending to avoid > excessive >* preemption: >* >* For example, if an under-utilized queue has used = reserved = 20. >* Preemption policy will try to preempt 20 containers (which is not >* satisfied) from different hosts. >* >* In FifoPreemptionSelector, there's no guarantee that preempted >* resource can be used by pending request, so policy will preempt >* resources repeatly. >*/ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cl