[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-04-20 Thread kyungwan nam (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445528#comment-16445528
 ] 

kyungwan nam commented on YARN-8020:


[~eepayne]
I don’t think this is the same as YARN-8179.
in case of YARN-8179, to-be-preempted resources are calculated correctly. the 
problem happens when applying natural_termination_factor.
but, in this issue, idealAssigned resources are not correctly calculated. as a 
result to-be-preempted resources are not correct.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> as a result, idealAssigned will be , which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-04-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444127#comment-16444127
 ] 

Eric Payne commented on YARN-8020:
--

 [~kyungwan nam], is this issue the same as YARN-8179?

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> as a result, idealAssigned will be , which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412106#comment-16412106
 ] 

Wangda Tan commented on YARN-8020:
--

Thanks [~eepayne]/[~kyungwan nam] for comments. 

Haven't checked much detailed of above comments. I believe we have some issues 
in existing DRF preemption logic. I plan to spend some time to add unit tests 
to YARN-8004 in the next several weeks.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> as a result, idealAssigned will be , which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-21 Thread kyungwan nam (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407717#comment-16407717
 ] 

kyungwan nam commented on YARN-8020:


I'm thinking the reason why it happens is as follows.
{code:java}
// assign all cluster resources until no more demand, or no resources are
// left
while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant,
unassigned, Resources.none())) {
  Resource wQassigned = Resource.newInstance(0, 0);
  // we compute normalizedGuarantees capacity based on currently active
  // queues
  resetCapacity(unassigned, orderedByNeed, ignoreGuarantee);

  // For each underserved queue (or set of queues if multiple are equally
  // underserved), offer its share of the unassigned resources based on its
  // normalized guarantee. After the offer, if the queue is not satisfied,
  // place it back in the ordered list of queues, recalculating its place
  // in the order of most under-guaranteed to most over-guaranteed. In this
  // way, the most underserved queue(s) are always given resources first.
  Collection underserved = getMostUnderservedQueues(
  orderedByNeed, tqComparator);
  for (Iterator i = underserved.iterator(); i
  .hasNext();) {
TempQueuePerPartition sub = i.next();
Resource wQavail = Resources.multiplyAndNormalizeUp(rc, unassigned,
sub.normalizedGuarantee, Resource.newInstance(1, 1));
Resource wQidle = sub.offer(wQavail, rc, totGuarant,
isReservedPreemptionCandidatesSelector);
Resource wQdone = Resources.subtract(wQavail, wQidle);

if (Resources.greaterThan(rc, totGuarant, wQdone, Resources.none())) {
  // The queue is still asking for more. Put it back in the priority
  // queue, recalculating its order based on need.
  orderedByNeed.add(sub);
}
Resources.addTo(wQassigned, wQdone);
  }
  Resources.subtractFrom(unassigned, wQassigned);
}
{code}
{quote}default, 27648, 209, 3072, 1, 207360, 120, 30720, 210, 0, 0, 0, 0, 
label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 4096, 25, 11264, 88, 207360, 
120, 15360, 113, 0, 0, 0, 0
{quote}
'unassigned' would be assigned in the most underserved order. therefore, most 
vcores of 'unassigned' have been allocated to pri queue.
 therefore, when offer() is called for default queue, 'unassinged' would be a 
large memory and a few vcores.
 let’s assume, 'avail' <20, 7> 
 normally, in this case, min(avail, (current + pending - assigned) ) should be 
‘avail’. because, available vcores are not enough.
 but, it was (current + pending - assigned) due to memory.

min ( <20, 7>, ( <27648, 209> + <3072, 1> - <207360, 120> ) )
 min ( <20, 7>, <-176640, 90> ) = <-176640, 90>

as a result, idealAssigned for default queue is <-176640, 90> + <207360, 120> = 
<30720, 210>

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>

[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-20 Thread kyungwan nam (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405933#comment-16405933
 ] 

kyungwan nam commented on YARN-8020:


[~eepayne]

Sorry for the late response.

I've seen this problem in branch-2.8 and HDP-2.6.4.

Cluster
 * Cluster total resources : <405 GB, 240 VCores>
 * default Queue: 50%, 100% max capacity
 * pri Queue: 50% capacity, 100% max capacity
 * label1 Queue: 0% capacity, 0% max capacity
 * there is ’label1’ non-exclusive node-label in my cluster. but, all nodes are 
included in the default node-label.

capacity-scheduler 
{code}
yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled=true
yarn.scheduler.capacity.reservations-continue-look-all-nodes=true
yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.accessible-node-labels.label1.capacity=100
yarn.scheduler.capacity.root.acl_administer_queue= 
yarn.scheduler.capacity.root.acl_submit_applications= 
yarn.scheduler.capacity.root.acl_submit_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.accessible-node-labels= 
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-applications=100
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.default.priority=1
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=3
yarn.scheduler.capacity.root.label1.accessible-node-labels=label1
yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.capacity=100
yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.maximum-am-resource-percent=0.7
yarn.scheduler.capacity.root.label1.acl_submit_applications=*
yarn.scheduler.capacity.root.label1.capacity=0
yarn.scheduler.capacity.root.label1.default-node-label-expression=label1
yarn.scheduler.capacity.root.label1.maximum-am-resource-percent=0.7
yarn.scheduler.capacity.root.label1.maximum-applications=100
yarn.scheduler.capacity.root.label1.maximum-capacity=0
yarn.scheduler.capacity.root.label1.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.label1.priority=1
yarn.scheduler.capacity.root.label1.state=RUNNING
yarn.scheduler.capacity.root.label1.user-limit-factor=3
yarn.scheduler.capacity.root.ordering-policy=priority-utilization
yarn.scheduler.capacity.root.pri.accessible-node-labels= 
yarn.scheduler.capacity.root.pri.acl_submit_applications=*
yarn.scheduler.capacity.root.pri.capacity=50
yarn.scheduler.capacity.root.pri.maximum-capacity=100
yarn.scheduler.capacity.root.pri.minimum-user-limit-percent=50
yarn.scheduler.capacity.root.pri.priority=1
yarn.scheduler.capacity.root.pri.state=RUNNING
yarn.scheduler.capacity.root.pri.user-limit-factor=3
yarn.scheduler.capacity.root.queues=default,pri,label1
{code}


how to reproduce
 * app1, which asking for <1GB, 1 VCore> AM container and 29 * <1GB, 8 VCores> 
containers has been submitted to default Queue.
 * after all containers for app1 have been allocated, submit app2, which asking 
for <1GB, 1 VCore> AM container and 14 * <1GB, 8VCores> containers to pri queue
 * as expected, some containers for app1 are preempted 

{code:java}
2018-03-19 21:51:50,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) - 
Trying to use 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector
 to select preemption candidates

2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: label1 CUR:  PEN:  RESERVED: 
 GAR:  NORM: NaN IDEAL_ASSIGNED: 
 IDEAL_PREEMPT:  ACTUAL_PREEMPT: 
 UNTOUCHABLE:  PREEMPTABLE: 



2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: label1 CUR:  PEN:  RESERVED: 
 GAR:  NORM: NaN IDEAL_ASSIGNED: 
 IDEAL_PREEMPT:  ACTUAL_PREEMPT: 
 UNTOUCHABLE:  PREEMPTABLE: 



2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: pri CUR:  PEN:  
RESERVED:  GAR:  NORM: 0.5 
IDEAL_ASSIGNED:  IDEAL_PREEMPT:  
ACTUAL_PREEMPT:  UNTOUCHABLE:  
PREEMPTABLE: 



2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator 
(PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219))
 -  NAME: pri CUR:  PEN:  RESERVED: 
 GAR:  NORM: NaN IDEAL_ASSIGNED: 
 IDEAL_PREEMPT:  ACTUAL_PREEMPT: 
 UNTOUCHABLE:  PREEMPTABLE: 



2018-03-19 21:51:50,271 DEBUG

[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405166#comment-16405166
 ] 

Eric Payne commented on YARN-8020:
--

[~leftnoteasy], sorry for the delay.
{quote}explain why preemption doesn't happen for the case you mentioned
{quote}
As it turns out, the corner case I'm running into is not related to DRF. It has 
the same behavior with the default resource calculator.

The use case is this:
 - QueueA is preemptable and is running App1 which is consuming the entire 
cluster
 - App2 is submitted to QueueB with container requests where each container is 
larger then the user limit for QueueB.

In this case, preemption will not occur.

DETAILS:
 - Cluster size: 20G
 - Cluster Min container size: 1G
 - QueueA capacity: 10G
 - QueueB capacity: 10G
 - QueueB MULP: 10%

ACTIONS:
 - App1 running in QueueA consumes 20G
 - App2 is submitted to QueueB with AM size 1G and map container sizes 4G.
 - App2's Max Resource is 1G when it is requesting the AM container (10% of 10G 
== 1G). The preemption monitor sees that the pending request is 1G and that 
App2's headroom is 1G, so it preempts 1G from App1 in QueueA.
 - The Capacity Scheduler assigns 1G to App2 in QueueB. App2 begins running the 
AM container.
 - App2 requests several map containers at 4G each. App2's Max Resource is 
computed to be 2G. (((active user's used resources/# active users) + min 
container size) == (1G/1 + 1G) == 2G)). This leaves 1G of headroom for App2.
 - The preemption monitor sees that the requested container size for App2 is 4G 
which is larger than the 1G headroom, so the preemption monitor does not 
preempt.

Technically, this behavior is slightly out of sync with the way the capacity 
scheduler assigns containers. As long as the headroom for an app is 0 or more, 
the capacity scheduler will assign one more container, no matter how big the 
container is, so the preemption monitor should go ahead and preempt in this 
case. I'm not sure I want it to, though, because it's better to be conservative 
than to preempt when it should not.

[~kyungwan nam], on what version of YARN are you seeing this problem? I am not 
seeing any DRF-related issues in 2.8 or 3.x.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> a

[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398575#comment-16398575
 ] 

Eric Payne commented on YARN-8020:
--

bq. explain why preemption doesn't happen for the case you mentioned
I don't know why yet. I'm still investigating.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> as a result, idealAssigned will be , which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398088#comment-16398088
 ] 

Wangda Tan commented on YARN-8020:
--

[~eepayne], could you explain why preemption doesn't happen for the case you 
mentioned:

bq. The place where it seems to get stuck is when the containers in the 
preemptable queue are using one or more smaller Resource elements than the 
containers in the asking queue. For example, it will sometimes not preempt if 
the preemptable queue has containers using  and the 
asking queue queue has containers using .

[~sunilg] mentioned one case before: YARN-6538 which also causes preemption not 
happening.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> as a result, idealAssigned will be , which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397262#comment-16397262
 ] 

Eric Payne commented on YARN-8020:
--

[~kyungwan nam], on what version of YARN are you seeing this problem? My 
experience with DRF is different than is described above. I have investigated 
this on both 2.8 and 3.2 snapshot builds.

We are using the DRF calculator in large preemptable queues with various sizes 
of containers using both large memory or large vcores or both. Cross-queue 
preemption seems to be working well in general. I do see a corner case, but 
first I want to address your above comments.

bq. as a result, idealAssigned will be , which does 
not trigger preemption.
If one of the elements in the idealAssigned Resource is 0 or less than 0, 
preemption will not occur. This is so that preemption won't bring the queue too 
far below its guarantee for one of the elements. Having said that, it will 
preempt to a large extent even if it brings one of the elements below its 
guarantee, but if one of them goes to 0 or below in the idealAssigned Resource, 
it will stop preempting.

bq. avail: 
Cross-queue preemption will not preempt if there are available resources in the 
cluster or queue. It depends on how many resources are being requested by the 
other queue, but even with 1 available vcore, preemption may choose not to 
preempt in this case as well.

Now on to my corner case.

I do not see a problem using DRF if the containers in the preemptable queue 
have a larger Resource element and the containers in the asking queue have 
smaller Resource elements. For example, it seems to work fine if Resources in 
the preemptable queue is using  containers and the 
asking queue is using smaller containers, for example  
containers.

The place where it seems to get stuck is when the containers in the preemptable 
queue are using one or more smaller Resource elements than the containers in 
the asking queue. For example, it will sometimes not preempt if the preemptable 
queue has containers using  and the asking queue queue 
has containers using .

Even in the latter case, preemption will sometimes still occur, depending on 
the ratio of the sizes of each element to the ones in the ohter queue.

It would be helpful if you can provide a more detailed use case to describe 
exactly what you are seeing so I can try to reproduce it.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cl