[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503776#comment-15503776 ] Sunil G commented on YARN-4945: --- As we are going more detailed reviews, I think we can do it in YARN-2009 itself as this is an umbrella jira. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch, YARN-2009.v3.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502794#comment-15502794 ] Sunil G commented on YARN-4945: --- My bad too.. I also only ran UT cases after this change. Thank you very much [~eepayne] for pointing out the same. I think i know the problem here. we try to merge {{pendingOrderingPolicy.getSchedulableEntities()}} and {{orderingPolicy.getSchedulableEntities()}}. Eventhough both are TreeSet, both uses different comparator. So HashSet change will be fine as we are not looking for any type of order in the target data structure. I wll try some more optimization here in next patch. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch, YARN-2009.v3.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499359#comment-15499359 ] Eric Payne commented on YARN-4945: -- [~sunilg], I'm afraid I gave you bad advice [in my comment above|https://issues.apache.org/jira/browse/YARN-4945?focusedCommentId=15495060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15495060] regarding the fix for {{LeafQueue#getAllApplications()}} My original suggestion was to create a new {{TreeSet}} object for {{apps}}: {code} Collection apps = new TreeSet( pendingOrderingPolicy.getSchedulableEntities()); {code} But that causes the {{SchedulingMonitor}} thread to crash with the following exception: {noformat} 2016-09-17 17:07:31,156 [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] ERROR yarn.YarnUncaughtExceptionHandler: Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp cannot be cast to java.lang.Comparable at java.util.TreeMap.compare(TreeMap.java:1290) at java.util.TreeMap.put(TreeMap.java:538) at java.util.TreeSet.add(TreeSet.java:255) at java.util.AbstractCollection.addAll(AbstractCollection.java:344) at java.util.TreeSet.addAll(TreeSet.java:312) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getAllApplications(LeafQueue.java:1859) {noformat} I originally suggested using {{TreeSet}} because that is what is returned by {{getSchedulableEntities()}}. But, since that causes an exception, I tried using {{HashSet}} instead. That seems to work (but I'm not sure if that's the best solution): {code} Collection apps = new HashSet( pendingOrderingPolicy.getSchedulableEntities()); {code} > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch, YARN-2009.v3.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497081#comment-15497081 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne], I think the scenario makes sense to me. And as earlier discussed, an ideal order of "preemption to help for better resource distribution" can be in follow order {{InterQueue => FiFoIntraQueue (User-limit => priority) => etc }} . So within a queue, user-limit will have the first say. And i think it can be incorporated to existing design with minimal refactoring. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch, YARN-2009.v3.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497037#comment-15497037 ] Eric Payne commented on YARN-4945: -- [~leftnoteasy] bq. Have a Map of username to headroom inside the method can compute user limit at most once for different user. And this logic can be reused to compute user-limit preemption Maybe we are talking about the same thing, but I just want to clarify that I am not advocating preemption based on headroom (user-limit-factor). I am advocating based on minimum user limit percent (MULP), which is the minimum guaranteed resource amount per user per queue. {quote} To be honest, I haven't thought a good way that a list of policies can better solve the priority + user-limit preemption problem. Could you share some ideas about it. For example, how to better consider both in the final decision {quote} I believe that the two preemption policies (priority and minimum-user-limit-percent) are _mostly_ (but not completely) separate. I would say that priority preemption only considers apps from the same user, and MULP preemption only considers apps from different users. If you look at the behavior of the capacity scheduler, I was surprised to find that it mostly ignores priority when assigning resources between apps of different users. I conducted the following experiment, without turning on preemption: # The cluster has only 1 queue, and it takes up all of the resources. ||Queue Name||Total Containers||{{user-limit-factor}}||{{minimum-user-limit-percent}}||Priority Enabled|| |default|24|1.0 (each user can take up the whole queue if no other users are present)|0.25 (if other users are present, each user is guaranteed at least 25% of the queue's resources; at max, 4 users can have apps in the queue at once; if less than 4 users, the scheduler tries to balance resources evenly between users)|false| # {{user1}} starts {{app1}} in {{default}} of {{priority1}} and consumes all resources # {{user2}} starts {{app2}} in {{default}} of {{priority2}}. ||User Name||App Name||App Priority||Used Containers||Pending Containers|| |user1|app1|1|24|76| |user2|app2|2|0|100| # I kill 12 containers from {{app1}} and the capacity scheduler assigns them to {{app2}}. Not because {{app2}} has a higher priority than {{app1}}, but because {{user2}} is using less resources than {{user1}} (the capacity scheduler tries to balance resources between users). ||User Name||App Name||App Priority||Used Containers||Pending Containers|| |user1|app1|1|12|76| |user2|app2|2|12|76| # At this point, what should happen if I kill another container from {{app1}}? Since {{app2}} is higher priority than {{app1}}, and since MULP is 25% (so {{user2}}'s minimum guarantee is only 6), you might think that the capacity scheduler will give it to {{app2}} (that's what I thought it would do). _But it doesn't!_ The capacity scheduler gives the container back to {{app1}} because it wants to balance the resources between all users. And the table remains the same: ||User Name||App Name||App Priority||Used Containers||Pending Containers|| |user1|app1|1|12|76| |user2|app2|2|12|76| Once the users are balanced, no matter how many times I kill a container from {{app1}}, it always goes back to {{app1}}. From a priority perspective, this could be considered an inversion, since {{app2}} is asking for more resources and {{app1}} is well above its MULP. But the capacity scheduler does not consider priority in this case. If I try the same experiment, but with both apps owned by {{user1}}, then I can kill all of {{app1}}'s containers (except the AM) and they all get assigned to {{app2}} Because the capacity scheduler behaves this way, I would recommend that the MULP preemption policy run first and try to balance each user's ideal assigned. The MULP policy would preempt from lowest priority first, so it would consider priority of apps owned by other, over-served users when deciding what to preempt, and consider priority of apps owned by the current, under-served user, when deciding ideal-assigned values. Then, I would run the priority policy but only consider apps within each user. As shown above, once the users are balanced between each other with regard to MULP, trying to kill containers from higher priority apps of other users will only cause preemption churn. [~leftnoteasy], as you said, we may be able to combine the two into one policy, and you may be right that this can be done without being too complicated. The thing I want to ensure is that the priority preemption policy doesn't try to kill high priority containers from different users that will only be reassigned back to the original user and cause preemption churn. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 >
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495060#comment-15495060 ] Eric Payne commented on YARN-4945: -- [~sunilg], I noticed in the resourcemanager log that the metrics were not as I would expect after running applications. For example, after 1 application has completed running, the {{#queue-active-applications}} metrics remains 1 instead of 0: {code} 2016-09-16 01:11:10,189 [SchedulerEventDispatcher:Event Processor] INFO capacity.LeafQueue: Application removed - appId: application_1473988192446_0001 user: hadoop1 queue: glamdring #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 1 {code} After 3 applications have run, the metrics are even more unexpected: {code} 2016-09-16 01:12:34,622 [SchedulerEventDispatcher:Event Processor] INFO capacity.LeafQueue: Application removed - appId: application_1473988192446_0003 user: hadoop1 queue: glamdring #user-pending-applications: -4 #user-active-applications: 4 #queue-pending-applications: 0 #queue-active-applications: 3 {code} I believe the cause of this is in {{LeafQueue#getAllApplications}}: {code} public Collection getAllApplications() { Collection apps = pendingOrderingPolicy.getSchedulableEntities(); apps.addAll(orderingPolicy.getSchedulableEntities()); return Collections.unmodifiableCollection(apps); } {code} The call to {{pendingOrderingPolicy.getSchedulableEntities()}} returns the {{AbstractComparatorOrderingPolicy#schedulableEntities}} object, and then the call to {{apps.addAll(orderingPolicy.getSchedulableEntities())}} adds additional {{FiCaSchedulerApp}}'s to {{schedulableEntities}}. By creating a copy of the return value of {{pendingOrderingPolicy.getSchedulableEntities()}}, I have been able to verify that the {{schedulableEntities}} does not have extra entries. For example: {code} public Collection getAllApplications() { Collection apps = new TreeSet( pendingOrderingPolicy.getSchedulableEntities()); apps.addAll(orderingPolicy.getSchedulableEntities()); return Collections.unmodifiableCollection(apps); } {code} > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494825#comment-15494825 ] Wangda Tan commented on YARN-4945: -- 1) YarnConfiguration: - Instead of have a separate SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION, should we only have a "queue.intra-queue-preemption-enabled"? I cannot clearly think what it means in semantic, one example is, after we have user-limit preemption support, what happens if we only enable the user-limit preemption (without priority preemption enabled)? 2) PCPP: - Unused imports / methods - getPartitionResource: avoid clone resources? Because we will clone resource twice for every app. If you concern about consistency, you can clone it once before starting preemption calculation - It seems to me, partitionToUnderServedQueues can be kept in AbstractPreemptableResourceCalculator. In addition, Map> could be Map>. (LinkedHashSet is not necessarily needed, because we won't have two TempQueuePerPartition with the same queueName and same partition) 3) CapacitySchedulerPreemptionUtils: - deductPreemptableResourcePerApp, is following a valid comment? bq. // High priority app is coming first - Remove unnecessary param in method and new generic type (like new HashMap(...)), better to move to Intellij? :p - {getResToObtainByPartitionForApps}} can be removed, we can directly use policy.getResourceDemandFromAppsPerQueue 4) FiCaSchedulerApp: Mvoe getTotalPendingRequestsPerPartition to ResourceUsage? I can see we could have requirements to: getUsedResourceByPartition, getReservedReosurceByPartition, etc. in the future 5) PreemptionCandidatesSelector: - All non-abstract methods can be static, correct? - All TODOs in comments are done, correct? 6) IntraQueuePreemptionPolicy and PriorityIntraQueuePreemptionPolicy: - Overall: Do you think if the name: -Policy is too big? What it essentially do is computing how much resource to preempt from each app, how about call it something like IntraQueuePreemptionComputePlugin? Would like to hear thoughts from you and Eric for this as well. - Rename the PriorityIntraQueuePreemptionPolicy to FifoIntraQueuePreemptionPolicy if you agree with [my comment|https://issues.apache.org/jira/browse/YARN-4945?focusedCommentId=15494454&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15494454] - PriorityIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue: a. resToObtainByPartition can be removed from parameter b. IIUC, it gets resourceToObtain for each app instead of gets resourceDemand for each app, rename it properly? c. This logic is not correct: {code} // If demand is from same priority level, skip the same. if (!tq.isPendingDemandForHigherPriorityApps(a1.getPriority())) { continue; } {code} It can only avoid highest priority in a queue applications preempt from each other, but it cannot avoid 2nd highest applications from each other. And the performance can be improved as well, I believe in some settings, maxAppPriority can be as much as MAX_INT. Please look for below comments/pesudo code for details. - computeAppsIdealAllocation: a. Calling getUserLimitHeadRoomPerApp is too expensive, instead we can add one method in LeafQueue to get UserLimit by userName. Have a Map of username to headroom inside the method can compute user limit at most once for different user. And this logic can be reused to compute user-limit preemption b. {{tq.addPendinResourcePerPriority(tmpApp.getPriority(), tmpApp.pending);}} could be changed if you agree with above .c c. I think we should move the {{skip the same priority demand}} logic into this method. One approach in my mind is: {code} // General idea: // Use two pointer, one from most prioritized app, one from least prioritized app // Each app has two quotas, one is how much resource required (ideal - used), // Another one is how much resource can be preempted // Move the two pointer and update the two quotas to get: // For application X, is there any app with higher priority need the resource? p1 = most-prioritized-app.iterator p2 = least-prioritized-app.iterator // For each app, we have: // - "toPreemptFromOther" which initialized to (ideal - (used - selected)). // - "actuallyToBePreempted" initialized to 0 while (p1.getPriority() > p2.getPriority() && p1 != p2) { Resource rest = p2.toBePreempt - p2.actuallyToBePreempted; if (rest > 0) { if (p1.toBePreemptFromOther > 0) { Resource toPreempt = min(p1.toBePreemptFromOther, rest); p1.toBePreemptFromOther -= toPreempt p2.actuallyToBePreempted += toPreempt } } if (p2.toBePreempt - p2.actuallyToBePreempted == 0) { // Nothing more can be preempt from p2, move to next p2 --; } if (p1.toBePreemptFromOther == 0) { // p1 is satis
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494454#comment-15494454 ] Wangda Tan commented on YARN-4945: -- [~eepayne], bq. I need to understand what it would mean to combine all intra-queue priority policies into one. To clarify, we may not combine *all* intra-queue policies into one, but if you look at queue internal policies. There are majorly two groups: 1) Fair + user-limit + priority 2) Fifo + user-limit + priority User-limit and priority will be always on and ordering policy like Fair/Fifo is a changeable config. So it makes sense to me to have two different policies, one for Fifo (plus priority/UL) and Fair (same plus priority/UL) bq. If they are combined, then is it still necessary to make IntraQueuePreemptionPolicy an interface? As I mentioned above, we can have a fair intra-queue policy. To be honest, I haven't thought a good way that a list of policies can better solve the priority + user-limit preemption problem. Could you share some ideas about it. For example, how to better consider both in the final decision > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494394#comment-15494394 ] Eric Payne commented on YARN-4945: -- bq. I would say it may not be necessarily to have two separate policies to consider priority and user-limit. [~leftnoteasy], I'm not sure how I feel about that yet. I need to understand what it would mean to combine all intra-queue priority policies into one. Whatever the design, I want to make sure it is not cumbersome to solve the user-limit-percent inversion that we often see. If they are combined, then is it still necessary to make {{IntraQueuePreemptionPolicy}} an interface? Wouldn't this just be the implementation class and then there would be no need for {{PriorityIntraQueuePreemptionPolicy}} or other derivative classes? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494187#comment-15494187 ] Wangda Tan commented on YARN-4945: -- [~eepayne] / [~sunilg], For the suggestion from [~eepayne]: bq. I think that the objects that implement the IntraQueuePreemptionPolicy interface should be in in a List, and then IntraQueueCandidatesSelector#selectCandidates should loop over the list to process the different policies. I would say it may not be necessarily to have two separate policies to consider priority and user-limit. In my existing rough thinking, only minor changes required to support FIFO + Priority + user-limit intra-queue preemption, if it is really required, we can refactor this part when we move to user-limit preemption. The other reason is the two intra-queue preemption policy (user-limit / priority) can affect to each other, we cannot do priority preemption without considering user-limit, and vice versa. So if we can consider both with a reasonable code complexity, why not :)? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494103#comment-15494103 ] Sunil G commented on YARN-4945: --- I think i missed the previous comment from [~eepayne]. Let me share another patch after checking the comments. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch, YARN-2009.v2.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493893#comment-15493893 ] Eric Payne commented on YARN-4945: -- Thanks again, [~sunilg]. I will look closely at the patch, but one thing I wanted to bring out before too much time passes is that some of the IntraQueue classes seem priority-centric and do not lend themselves to adding multiple intra-queue policies. - The constructor for {{IntraQueueCandidatesSelector}} passes {{priorityBasedPolicy}} as a parameter directly to the constructor for {{IntraQueuePreemptableResourceCalculator}} - {{IntraQueueCandidatesSelector#selectCandidates}} passes {{priorityBasedPolicy}} as a parameter directly to {{CapacitySchedulerPreemptionUtils.getResToObtainByPartitionForApps}}. I think that the objects that implement the {{IntraQueuePreemptionPolicy}} interface should be in in a {{List}}, and then {{IntraQueueCandidatesSelector#selectCandidates}} should loop over the list to process the different policies. Please change the name of variables in classes that need to be independent of the specific intra-queue policy: - {{CapacitySchedulerPreemptionUtils#getResToObtainByPartitionForApps}} has a parameter named {{priorityBasedPolicy}}, but this should be generic, like {{intraQueuePolicy}} - {{IntraQueuePreemptableResourceCalculator}} also has a variable named {{priorityBasedPolicy}}, which I think should be more generic. - {{CapacitySchedulerConfiguration#SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}}: since the value for this property is the switch to turn on intra-queue preemption, the name should be something more generic. Currently, it is {{yarn.resourcemanager.monitor.capacity.preemption.select_based_on_priority_of_applications}}, but it should be something like {{enable_intra_queue_preemption}}. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492609#comment-15492609 ] Wangda Tan commented on YARN-4945: -- [~sunilg], took a quick look at the patch, overall approach looks good. For the TODO items, I think reservation logic support can be moved to a separate ticket, for apps running inside the same queue, it is more likely that resources are more homogeneous. For the other two TODOs, it's better to be addressed in the same patch. And one minor comment: - Definition and initialization of IntraQueuePreemptionPolicy is in the IntraQueueCandidatesSelector now, but I think it might be better to move them to IntraQueuePreemptableResourceCalculator. And I think we might not need the userLimitBasedPolicy, it could be a part of the existing IntraQueuePreemptionPolicy. I will include more detailed reviews for the final patch :). Thanks, > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch, > YARN-2009.v1.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488403#comment-15488403 ] Eric Payne commented on YARN-4945: -- bq. Does this make sense? [~sunilg], Thanks for the reply. Yes. I was misreading the code. Sorry about that. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478614#comment-15478614 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne] for the detailed comments. Agreeing to the comments mostly. I will update them in next patch. Few doubts or clarifications for below comments. bq.It doesn't make sense to me to limit intra-queue preemption based on how much of the queue's guaranteed resources are used. I will try explain here. For queue, if there are plenty of resources unused (guaranteed - used > 70%), then pending apps can get certain amount of resources mostly (priority/user-limit etc). We can think on a safe default limit here. Still this is per-queue only. Does this make sense? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478599#comment-15478599 ] Sunil G commented on YARN-4945: --- Thanks [~leftnoteasy] for the offline discussion. Updating the summary here. 1. All the comments other then IntraQueuePreemptableResourceCalculator & IntraQueueCandidatesSelector: will be addressed in next patch. IntraQueuePreemptableResourceCalculator & IntraQueueCandidatesSelector: 2. bq.This is not correct for future policies, for example, fairness policy can have a minimum resource allocated to each application, existing logic will preempt all containers from the application with maximum fair sharing. Current patch addresses most of the priority related handling from calculator. This is need to refined and go with generalized preemption flow approach as given in pseudo code. 3. {{app.ideal = max(Q.unallocated, app.used + app.pending - app.selected)}} Here it will be *min* instead of max. 4.psuedo code - We will find each queue's unallocated share which is calculated as {{Q.used - Q.selected;}} here Q.selected means selected containers from this queue by earlier policies. - Then will calculate app's ideal share iteratively starting from highest priority app. Here Q.ideal will help to calculate app.ideal. - {{intra_q_preemptable}} calculation is straight forward form the code. - Last loop will try to find preemptable resource per app. Its DOES NOT select candidates here. {{app.preemptable = min(max(app.used - app.selected - app.ideal, 0), intra_q_preemptable)}} will help to achieve the same. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478391#comment-15478391 ] Eric Payne commented on YARN-4945: -- Thanks [~sunilg]. I have a few review comments for patch v0: - {{IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand}}: -- Neither of the parameters are used ({{clusterResource}} {{totalPreemptedResourceAllowed}}) -- {{queueNames}} can be null, which causes an NPE in {{for (String queueName : queueNames)}} -- {{leafQueue}} will be null if {{tq}} represents a parent queue, which causes NPE when dereferenced later. -- {{CapacitySchedulerConfiguration.USED_CAPACITY_THRESHOLD_FOR_PREEMPTION}}: [~leftnoteasy] indicated above that this property is similar to {{MAX_IGNORED_OVER_CAPACITY}}, but I'm not sure I understand how that applies to intra-queue preemption. The comparison is not between queues at this point, it's between apps or users. In patch v0, the following code has the effect of only allowing preemption if the queue's used resources are below {{USED_CAPACITY_THRESHOLD_FOR_PREEMPTION}}, which defaults to 30%. It doesn't make sense to me to limit intra-queue preemption based on how much of the queue's guaranteed resources are used. {code} if (leafQueue.getUsedCapacity() < context .getUsedCapThresholdForPreemptionPerQueue()) { continue; } {code} - {{IntraQueueCandidatesSelector#selectCandidates}}: -- If {{queueName}} is not a leaf queue, {{leafQueue}} will be null and cause NPE when dereferenced later: {code} // 4. Iterate from most under-served queue in order. for (String queueName : queueNames) { LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName, RMNodeLabelsManager.NO_LABEL).leafQueue; {code} -- Very tiny nit: Remove the word {{get}} from the following: {code} // 3. Loop through all partitions to get calculate demand {code} - {{AbstractPreemptableResourceCalculator}}: -- Since {{TAComparator}} is specifically comparing app priority, can it be renamed to something like {{TAPriorityComparator}}? - {{TempAppPerQueue#toString}}: -- Small nit: Can the {{toString}} method print ApplicationID and rename {{NAME}} to {{QUEUENAME}}? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477484#comment-15477484 ] Sunil G commented on YARN-4945: --- HI [~leftnoteasy] Thanks for comments.. I generally understood the overall picture. However one doubt to start with {{Q.unallocated = Q.used - Q.selected;}} Here *selected* means guaranteed? bq.In each leaf queue, according to intra-queue preemption quota and other queue status, such as queue-policy, decide ideal-allocation and how much to preempt for each app It makes sense in other policies. For priority, only if demand is there from high priority apps, we can take resource from lower priority apps. So while doing computing ideal allocation, we can consider full resources from an app as preemptable. This decision can be taken in preemptionPolicy. Does this sounds correct with your thoughts? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475092#comment-15475092 ] Wangda Tan commented on YARN-4945: -- Thanks [~sunilg], Comments for v0: *AbstractPreemptableResourceCalculator:* - priorityBasedPolicy should not be a part of this class - getMostUnderservedQueues/resetCapacity/isReservedPreemptionCandidatesSelector should be private - TAComparator, move to IntraQueueCalculator *Configuration:* - USED_CAPACITY_THRESHOLD_FOR_PREEMPTION, it should be something like MAX_IGNORED_OVER_CAPACITY_FOR_INTRA_QUEUE for consistency? If you agree, all related fields, such as CapacitySchedulerPreemptionContext#getUsedCapThresholdForPreemptionPerQueue should be updated. *IntraQueuePreemptableResourceCalculator & IntraQueueCandidatesSelector:* The biggest issue I can see is: the logic is incompleted for calculator: we should include all ideal-allocation/preemptable resource calculation in this class, I found many of them are in PriorityIntraQueuePreemptionPolicy, such as: getResourceDemandFromAppsPerQueue. In addition, the calculation of ideal-allocation & preemptable resource is also incomplete: for the v0 patch, it computed resource-to-obtain for intra-queue preemption, and preempt from the least straved app. This is not correct for future policies, for example, fairness policy can have a minimum resource allocated to each application, existing logic will preempt all containers from the application with maximum fair sharing. So to make the logic to be complete, the IntraQueuePreemptableResourceCalculator should be: 1) Inter-queue preemptable resource will be calculated, it could be computed by IntraQueueCalculator or previous calculator 2) In each leaf queue, according to intra-queue preemption quota and other queue status, such as queue-policy, decide ideal-allocation and how much to preempt for each app 3) And we need to deduct selected resource for both queue/app (and even for user). For example, preemptable resource calculation for priority will be: {code} For each partition: Q.unallocated = Q.used - Q.selected; # initially, app.ideal = 0 # From highest priority to lowest priority app to calculate ideal for app in sorted-by(priority): if Q.unallocated < 0: break; app.ideal = max(Q.unallocated, app.used + app.pending - app.selected) Q.unallocated -= app.ideal # Intra queue preemptable quota intra_q_preemptable = Q.maximum-preemptable - selected # For lowest prioity to highest priority to calculate preemptable for app in reserve-sorted-by(priority): if intra_q_preemptable < 0: break; app.preemptable = min(max(app.used - app.selected - app.ideal, 0), intra_q_preemptable) intra_q_preemptable -= app.preemptable {code} Some additional notes for the pesudo code above: - Fairness policy need different logic to calculate ideal and preemptable, which similar to Algorithm 2 described in: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.pdf - If we need to consider user-limit, we should deduct user.selected as well Responsibility of calculator should be 1)-3), after that, select will decide what to preempt from each app. Of course, we will skip already selected containers while selecting intra-queue to-preempt containers. *Unit test* I found the TestProportionalCapacityPreemptionPolicyForIntraQueue uses duplicated logic from TestProportionalCapacityPreemptionPolicy, instead, could you take a look at ProportionalCapacityPreemptionPolicyMockFramework, which is used by TestProportionalCapacityPreemptionPolicyForReservedContainers and TestProportionalCapacityPreemptionPolicyForNodePartitions. You will be easily mock intra-queue preemption scenario with the new test framework. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472262#comment-15472262 ] Wangda Tan commented on YARN-4945: -- [~eepayne], bq. I actually would like to have the priority policy and the minumum-user-limit-percent policy be turned on separately. I'm not sure of the best way to do that, but our users don't use application priority very much. Now application priority is not very popular because it's not included by any release yet, but I bet it will be a popular feature in the future. :) I would prefer to have a unified preemption policy to handle all intra-queue preemption, because: - We have different combination of intra preemption criteria, like priority, fairness, user-limit, fifo. We cannot have a separate policy for each different combination. For example, fairness + user-limit and priority + user-limit. - Lots of common part of intra-queue preemption policies, especially after ideal-allocation resource is calculated for each apps, we have the common logic to select containers. - We may need different implementation for ideal-allocation resource calculator, one for fairness and one for fifo, both consider user-limit and priority. bq. Perhaps CapacitySchedulerConfiguration could have something like: As I mentioned above, we should enable intra-queue preemption by default for all limits. If we really need some parameters for better control, we can add them in the future. Otherwise it cause troubles, for example: consider priority without consider user-limit, excessive preemption could happen. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472235#comment-15472235 ] Eric Payne commented on YARN-4945: -- [~sunilg], thanks again for all of the great work you are doing on this issue. \\ - Separate switches for priority and user-limit-percent preemption? {{ProportionalCapacityPreemptionPolicy#init}} uses {{SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}} to turn on all intra-queue preemption policies, but the config property name for {{SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION}} is {{select_based_on_priority_of_applications}}. I actually would like to have the priority policy and the minumum-user-limit-percent policy be turned on separately. I'm not sure of the best way to do that, but our users don't use application priority very much. Perhaps {{CapacitySchedulerConfiguration}} could have something like: {code} /** * For intra-queue preemption, priority based selector can help to preempt * containers of lowest priority apps to find resources for high priority * apps. */ public static final String PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY = PREEMPTION_CONFIG_PREFIX + "select_based_on_priority_of_applications"; public static final boolean DEFAULT_PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY = false; /** * For intra-queue preemption, minimum-user-limit-percent based selector can * help to preempt containers to ensure users are not starved of their * guaranteed percentage of a queue. */ public static final String PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE = PREEMPTION_CONFIG_PREFIX + "select_based_on_user_percentage_guarantee"; public static final boolean DEFAULT_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE = false; {code} And then {{ProportionalCapacityPreemptionPolicy#init}} can turn on intra-queue preemption if either one is set: {code} boolean selectIntraQueuePreemptCandidatesByPriority = csConfig.getBoolean( CapacitySchedulerConfiguration.PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY, CapacitySchedulerConfiguration.DEFAULT_PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_APP_PRIORITY); boolean selectIntraQueuePreemptCandidatesByUserPercentGuarantee = csConfig.getBoolean( CapacitySchedulerConfiguration.PREEMPTION_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE, CapacitySchedulerConfiguration.DEFAULT_SELECT_INTRAQUEUE_CANDIDATES_BY_USER_PERCENT_GUARANTEE); if (selectIntraQueuePreemptCandidatesByPriority || selectIntraQueuePreemptCandidatesByUserPercentGuarantee) { candidatesSelectionPolicies.add(new IntraQueueCandidatesSelector(this)); } {code} Then, in {{IntraQueueCandidatesSelector}} logic could be added to do either one or both intra-queue preemption policies. What do you think? \\ \\ \\ - Could headroom check allow priority inversion? {{PriorityIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue}}: {code} // Can skip apps which are already crossing user-limit. // For this, Get the userlimit from scheduler and ensure that app is // not crossing userlimit here. Such apps can be skipped. Resource userHeadroom = leafQueue.getUserLimitHeadRoomPerApp( a1.getFiCaSchedulerApp(), context.getPartitionResource(partition), partition); if (Resources.lessThanOrEqual(rc, context.getPartitionResource(partition), userHeadroom, Resources.none())) { continue; } {code} I think this code will allow a priority inversion when a user has apps of different priorities. For example, in a situation like the following, {{App1}} from {{User1}} is already taking up all of the resources, so its headroom is 0. But, since {{App2}} is also from {{User1}}, the above code will never allow preemption to occur. Is that correct? ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources|| |QUEUE1|User1|App1|1|200|0| |QUEUE1|User1|App2|10|0|50| > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: y
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472191#comment-15472191 ] Eric Payne commented on YARN-4945: -- Thank you [~leftnoteasy]. I see now that {{IntraQueueCandidatesSelector#tryPreemptContainerAndDeductResToObtain}} is checking {{totalPreemptionAllowed}} before selecting each container. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471794#comment-15471794 ] Wangda Tan commented on YARN-4945: -- Hi [~eepayne], Maybe there're some misunderstandings, first of all, I'm not quite sure about why in step.5, priority-intra-queue preemption policy can select 5 more resources. Step.7 is reasonable to me, imbalance between queues has higher priority than priority inversion within a queue. In my mind, the whole preemption process will be (as same as your examples) Assume each queue has total-preemption-per-round, which is the total preemption allowed for inter-queue + intra-queue preemption. Step 1-4 will be same as what you described Step 5 will not happen because there's 10 containers marked for preemption already. So Step 4 will be repeated until: {code} Queue 1: User1, Used=100, Pending=50 User2, Used=0, Pending=50 Queue 2: Used=100, {code} Once the inter-queue resource usage back to balanced, intra-queue preemption policy can start to preempt resources. So the Step 5 will be: {code} 10 container marked to be preemption for User1 from Queue 1, and after these container preempted, they will be picked up by User2 from Queue 1. {code} [~sunilg] please add your thoughts if you think different. Thanks, > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471536#comment-15471536 ] Eric Payne commented on YARN-4945: -- [~leftnoteasy] and [~sunilg], I'm still concerned about having the intra-queue preemption policies adding containers to the {{selectedCandidates}} list if the inter-queue policies have already added containers. In that case, the containers selected by the intra-queue policies may not go back to the correct queue. Consider this use case: Queues (all are preemptable): ||Queue Name||Guaranteed Resources||Max Resources||{{total_preemption_per_round}}|| |root|200|200|0.1| |QUEUE1|100|200|0.1| |QUEUE2|100|200|0.1| # {{User1}} starts {{App1}} on {{QUEUE1}} and uses all 200 resources. These containers are long-running and will not be released any time soon: ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources||Selected For Preemption|| |QUEUE1|User1|App1|1|200|0|0| # {{User2}} starts {{App2}} on {{QUEUE2}} and requests 100 resources: ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources||Selected For Preemption|| |QUEUE1|User1|App1|1|200|0|0| |QUEUE2|User2|App2|1|0|100|0| # {{User1}} starts {{App3}} at a high priority on {{QUEUE1}} and requests 50 resources: ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources||Selected For Preemption|| |QUEUE1|User1|App1|1|200|0|0| |QUEUE1|User1|App3|10|0|50|0| |QUEUE2|User2|App2|1|0|100|0| # Since {{total_preemption_per_round}} is 0.1, only 10% of the needed resources will be selected per round. So, the inter-queue preemption policies select 10 resources to be preempted from {{App1}}. ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources||Selected For Preemption|| |QUEUE1|User1|App1|1|200|0|10| |QUEUE1|User1|App3|10|0|50|0| |QUEUE2|User2|App2|1|0|100|0| # Then, the priority-intra-queue preemption policy selects 5 more resources to be preempted from {{App1}}. ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources||Selected For Preemption|| |QUEUE1|User1|App1|1|200|0|15| |QUEUE1|User1|App3|10|0|50|0| |QUEUE2|User2|App2|1|0|100|0| # At this point, 15 resources are preempted from {{App1}}. # Since {{QUEUE2}} is asking for 100 resources, and is extremely underserved (from an inter-queue point of view), the capacity scheduler gives all 15 resources to {{QUEUE2}}, and the priority inversion remains in {{QUEUE1}}. ||Queue Name||User Name||App Name||App Priority||Used Resources||Pending Resources||Selected For Preemption|| |QUEUE1|User1|App1|1|185|15|0| |QUEUE1|User1|App3|10|0|50|0| |QUEUE2|User2|App2|1|15|85|0| This is why I am concerned that when containers are already selected by the inter-queue preemption policies, it may not be beneficial to have the intra-queue policies preempt containers as well. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468165#comment-15468165 ] Eric Payne commented on YARN-4945: -- Thanks [~sunilg]. {quote} bq. if it's already in selectedCandidates, it's because an inter-queue preemption policy put it there I think I must give some more clarity for what I am trying to do here. Its possible that there can be some containers which were selected by priority/user-limit policy may already be selected from inter-queue policies. In that case, we need not have to mark them again. Rather we can deduct the resource directly as its container marked for preemption. {quote} OK. I think I see what you are saying. In {{IntraQueueCandidatesSelector#preemptFromLeastStarvedApp}}: {code} if (CapacitySchedulerPreemptionUtils.isContainerAlreadySelected(c, selectedCandidates)) { Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource()); continue; } {code} IIUC, you are saying that at this point, {{toObtainByPartition}} contains requested resources from _both_ inter-queue _and_ intra-queue preemption policies. So, since this container has already been selected by the inter-queue policies, skip it, stop tracking its resources in {{toObtainByPartition}} (by subtracting out the container's size), and keep looking for another container to mark as preemptable. Is that correct? - Also, I think that priority and user-limit-percent preemption policies should be separate policies. Do you agree? If so, can we please rename {{IntraQueueCandidatesSelector}} to something like {{IntraQueuePriorityCandidatesSelector}} > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch, YARN-2009-wip.v3.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15466489#comment-15466489 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne] bq.LeafQueue#getApplications returns an umnodifiable Collection Yes, I have made changes to handle this scenario. bq.if it's already in selectedCandidates, it's because an inter-queue preemption policy put it there I think I must give some more clarity for what I am trying to do here. Its possible that there can be some containers which were selected by priority/user-limit policy may already be selected from inter-queue policies. In that case, we need not have to mark them again. Rather we can deduct the resource directly as its container marked for preemption. bq.container's resources twice from toObtainByPartition Its a mistake, I corrected the same in second patch. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15465592#comment-15465592 ] Eric Payne commented on YARN-4945: -- [~leftnoteasy] and [~sunilg], bq. Using logic similar to {{deductPreemptableResourcesBasedSelectedCandidates}} should be able to achieve this, and I think it doesn't bring too many complexities to the implementation. I'm sorry, but I'm still not understanding how this can work. In {{PriorityCandidatesSelector#preemptFromLeastStarvedApp}}: {code} if (CapacitySchedulerPreemptionUtils.isContainerAlreadySelected(c, selectedCandidates)) { Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource()); Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource()); continue; } {code} This code seems to indicate that if a container is already in {{selectedCandidates}}, it will be preempted and then given back to apps in this queue. But if it's already in {{selectedCandidates}}, it's because an inter-queue preemption policy put it there, so it's not likely to end up back in this queue. Please help me understand what I'm missing. Also, Why is it subtracting the container's resources twice from {{toObtainByPartition}}? Should one of those be {{totalPreemptedResourceAllowed}}? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458695#comment-15458695 ] Eric Payne commented on YARN-4945: -- [~sunilg], just one quick note: I am getting a {{UnsupportedOperationException}} RuntimeException in {{IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand}}: {code} Collection apps = leafQueue.getApplications(); apps.addAll(leafQueue.getPendingApplications()); {code} {{LeafQueue#getApplications}} returns an umnodifiable Collection. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456727#comment-15456727 ] Wangda Tan commented on YARN-4945: -- bq. Trying to do this coordination seems to me to be quite complicated. Using logic similar to {{deductPreemptableResourcesBasedSelectedCandidates}} should be able to achieve this, and I think it doesn't bring too many complexities to the implementation. bq. Would it be sufficient to just avoid preempting during the intra-queue policies if there are already containers in the selectedContainers list? If we want to avoid excessive preemption, It may not sufficient to me. We need to adjust ideal / to-preempt resource properly as well. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456717#comment-15456717 ] Wangda Tan commented on YARN-4945: -- bq. I think we do need several intra-queue configs that are separate from the existing (inter-queue) ones. For inter-queue vs. intra-queue, I think we need a separate one at least for total_preemption_per_round and max_ignored_over_capacity, and maybe even for natural_termination_factor and max_wait_before_kill. We definitely need some parameter for per-queue preemption setting, max-to-have in my mind is: - Minimum queue's used capacity to trigger preemption - Total preemption per round - Max ignored over capacity (for user limit) I suggest to add only must-to-have parameters, more options make a feature harder to be used. So I would prefer to not add things like natural-termination-factor / max-wait-before-kill for now. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455570#comment-15455570 ] Eric Payne commented on YARN-4945: -- Thanks very much [~sunilg] and [~leftnoteasy]. {quote} 1. I think we might need to come with a limit on how much resource can be preempted from over-utilizing users's apps. WE do have max-preemption-per-round. But sometimes it may be more as it may be configured for inter-queue. Since we are sharing this config, i think we can have a config to limit the preemption for user-limit. For priority, i have considered a certain limit to control this scenario. Thoughts? {quote} I think we do need several intra-queue configs that are separate from the existing (inter-queue) ones. For inter-queue vs. intra-queue, I think we need a separate one at least for {{total_preemption_per_round}} and {{max_ignored_over_capacity}}, and maybe even for {{natural_termination_factor}} and {{max_wait_before_kill}}. Are you also suggesting that we need these configs to also be spearate between user-limit-percent preemption and priority preemption within intra queue? I don't have a strong opinion either way, but if we can keep all configs the same between intra-queue preemption policies, I would like to do that, just to avoid confusion and complication. bq. I will not consider preemption demand from a high priority if that app is already crossing the user-limit. I just want to make sure we are talking about the same thing. In the case I am worried about, the high priority app is _*not*_ over any limit. There is an inversion happening because the lower priority app has containers and the high priority app wants them. But, if the low priority app is from a user that is at or below its {{minimum-user-limit-percent}}, the higher priority app must not continue to preempt from the lower priority app. This only can happen when the two apps are from different users. {quote} I think normalization for inter-queue / intra-queue preemption is one of the top priority goal for this feature. If you take a look at existing preemption code, it normalizes preempt-able resource for reserved-container-candidate-selector and fifo-candidate-selector. We can do the similar normalization for inter/intra-queue preemption. {quote} Trying to do this coordination seems to me to be quite complicated. Would it be sufficient to just avoid preempting during the intra-queue policies if there are already containers in the {{selectedContainers}} list? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454965#comment-15454965 ] Sunil G commented on YARN-4945: --- Thanks [~leftnoteasy] bq.What is intraQueuePreemptionCost? And what is AppPriorityComparator? Yes, there are not used. I will do a clean up in next patch to remove unnecessary code snippets. bq.For the queue-level ideal allocation is it enough to trust result from previous policies? If we are planning for normalization (I want to introduce in next patch after doing some more tests with SLS and real time), then we need to add up or subtract resources which are preempted from previous run. And this is needed to done per queue. Hence I kept the code. I can look into this and see how much more i can reuse as possible. bq.if (tq.intraQueuePreemptionCalculationDone == true), is this always false? intraQueuePreemptionCalculationDone will be set as false during init. We iterate over partitions, and there are chances that we may have multiple labels in a queue. Since I am iterating over partitions and then look into queues where this partition has some capacity, we may have same queue across partition. I was calculating resource demand like partition -> queue level. To avoid duplicate calculation, i use this variable. bq.Should we lock the queue inside for (LeafQueue leafQUeue : queues)? Make sense for me. I will handle this case. bq.return value of getResourceDemandFromAppsPerQueue is not consumed by anyone. I have cleaned up the code. bq.TempAppPerQueue is not used by anyone I am moving some common logic computation and some storage to here. I will have this in next patch. Thanks for sharing the flow. I also planned to do similar approach what you suggested, but I am doing some tasks together. I feel I can rearrange the code to fit to this code flow. However I feel deductPreemptableResourcesBasedSelectedCandidates could be done while we loop over each containers of an app also (we can deduct from resourceToObtain, but do not add same container again. I handled this scenario already). > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453773#comment-15453773 ] Wangda Tan commented on YARN-4945: -- Hi Sunil, Thanks for update, I haven't dig into too many details of this patch yet, I majorly looked at interactions between preemptable-resource-calculator and candidate-selector. Some comments so far: 1) What is intraQueuePreemptionCost? And what is AppPriorityComparator? 2) IntraQueuePreemptableResourceCalculator: 2.1) For the queue-level ideal allocation, is it enough to trust result from previous policies? PreemptableResourceCalculator saves per-queue ideal allocation to PCPP#queueToPartitions. I think we don't need add extra logic in the IntraQueuePreemptableResourceCalculator to resursively calculate it. Correct? 2.2 computeIntraQueuePreemptionDemand: - {{if (tq.intraQueuePreemptionCalculationDone == true)}}, is this always false? - Should we lock the queue inside {{for (LeafQueue leafQUeue : queues)}}? - return value of getResourceDemandFromAppsPerQueue is not consumed by anyone. - TempAppPerQueue is not used by anyone Since the logic looks incompleted yet, here's some thoughts about implementation/overall-code-structure from my side, hope it may help you. {code} IntraQueuePreemptableResourceCalculator { 1. Uses ideal resource calculated by previous policies. 2. Compute per-app ideal/preemptable resource according to per-queue policies. (Stored in TempAppPerPartition) } IntraQueueCandidateSelector { 1. Invoke IntraQueueCandidateSelector to calculate ideal/preemptable of apps 2. Use CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates to deduct preemptable resource for already selected containers. for (leafqueue from most underserved) { for (apps in reserve order) { if (app.preemptable > 0) { // Select container logic. } } } } {code} > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440197#comment-15440197 ] Wangda Tan commented on YARN-4945: -- Thanks for discussion, [~eepayne]. [~sunilg]. I think normalization for inter-queue / intra-queue preemption is one of the top priority goal for this feature. If you take a look at existing preemption code, it normalizes preempt-able resource for reserved-container-candidate-selector and fifo-candidate-selector. We can do the similar normalization for inter/intra-queue preemption. Once [~sunilg] has the patch with more test cases ready, we can do some reviews / tests to make sure it works. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439387#comment-15439387 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne] for the comments about the interaction b/w inter-queue and intra-queue together. Its a really a case to consider. While working on POC, [~leftnoteasy] has mentioned about a case of normalization. We have to consider that inter-queue preemption is done while doing preemption calculation and selection for intra-queue. In the initial version of POC, i tried to achieve the basic framework and a working prototype. But for the version, we need to normalize pre-selected containers before each round in intra-queue preemption. (i think we can consider user-limit and priority only for now) On the same note, we also need to consider case where selected containers are not rejected in subsequent rounds. Also if same containers are again re-selected in next rounds, we can consider those containers are already selected and deduct from resourceToObtain. These cases are handled as of today. I think I can add the impl details of this scenario also in the document so that it can come in first cut. Thoughts? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439264#comment-15439264 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne] for the use cases. Overall I feel the use cases looks fine for me,. I have looked the doc as well. Few comments from my end 1. {noformat} When one (or more) user(s) are below their minimun-user-limit-percent and one (or more) user(s) are above their minimum-user-limit-percent, resources will be preempted after a configurable time period from the user(s) which are above their minimumuser-limit-percent. {noformat} >>I think we might need to come with a limit on how much resource can be >>preempted from over-utilizing users's apps. WE do have >>max-preemption-per-round. But sometimes it may be more as it may be >>configured for inter-queue. Since we are sharing this config, i think we can >>have a config to limit the preemption for user-limit. For priority, i have >>considered a certain limit to control this scenario. Thoughts? 2. {noformat} App0 in QUEUE0 is consuming 50 extra resources and is not releasing them. But, since QUEUE0 is not preemptable, inter-queue preemption will not free the 50 resources from QUEUE0. In QUEUE1, User2 is guaranteed 50 resources, but User1 is also, and User1 is not going over its minimum-user-limitpercent, so User1 is not violating any policy. Therefore, the user-limit-percentintra-queue-preemption policy does not preempt any resources from User1. {noformat} >> Overall this point make sense. This is strictly intra-queue model and we >> need not have to consider across queue demand and its shortages. 3. {noformat} 1.3.1.3.1. TBD: I could see this going the other way. That is, we may want the policy to preempt from App1 until both users have an equal number of resources. {noformat} >> I mostly agree with you here. Once app1's containers are released, app2 can >> get resources. CS will ensure same, I dont think preemption module need to >> do that. 4. bq.Should there be some (possibly configurable) limit on the percent of containers preempted from App1? Instead of limiting from each app, I consider only a certain % of high priority app's demand. This can help to avoid over preemption cases. If we limit preempting container from an app, a higher priority app may loose some container. So for now, its better to clear lowest priority apps completely except its AM. On same line, apps could save some of its containers (critical, long running). For this, app itself can mark such containers. This will come as part of first class services, and we could improve preemption after same. 5. bq.the priority preemption policy will not preempt containers from the lower priority app if it would cause the lower priority app to go below the user’s minimum-user-limit-percent guarantee. I will not consider preemption demand from a high priority if that app is already crossing the user-limit. Such apps will be skipped while calculating demand. I have added this support in POC patch itself. Overall these cases make sense, and thank you very much or sharing same. Its very good to validate and consider all problem statements at this stage itself. :) I have not looked your latest comment, i will help to comment on same in a shortwhile. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439148#comment-15439148 ] Eric Payne commented on YARN-4945: -- [~sunilg] and [~leftnoteasy], While I was writing the use cases, I thought about the interaction between the inter-queue and intra-queue preemption policies, and I want to get your thoughts. The following use case is not in the doc, because I'm not sure how to handle it. Consider this use case: The cluster has 2 queues, both preemptable, both have configured max capacity that can use the whole cluster. ||Queue Name||Queue Guaranteed Resources||Queue Max Resources||Queue {{minimum-user-limit-percent}}||Queue Preemptable|| |root|200|200|N/A|N/A| |{{QUEUE1}}|100|200|100%|Yes| |{{QUEUE2}}|100|200|50%|Yes| --- ||Queue Name||User Name||App Name||Resources Used||Resources Guaranteed Per User by {{minimum-user-limit-percent}}||Pending Resources|| |{{QUEUE1}}|{{User1}}|{{App1}}|120|100|0| |{{QUEUE2}}|{{User2}}|{{App2}}|80|50|0| |{{QUEUE2}}|{{User3}}|{{App3}}|0|50|20| # The inter-queue preemption policy sees that {{QUEUE2}} is underserved and is asking for 20 resources, and {{QUEUE1}} is over-served by 20 resources, so it preempts 20 resources from {{App1}}. # The intra-queue preemption policy sees that {{User3}} is under its {{minimum-user-limit-percent}} and is aksing for 20 resources, and {{User2}} is over its {{minimum-user-limit-percent}}, so the intra-queue-preemption policy preempts 20 resources from {{App2}}. # The result of this scenario is that 20 resources are preempted when they should not be. The scenario I have laid out above assumes that intra-queue preemption did not know about the 20 containers that are already preempted to fulfill the needs of {{App3}} in {{QUEUE2}}. I think that the design doc tries to address this, and assumes that the intra-queue preemption policy will be able to handle this use case and will not preempt more containers when it is not necessary. However, I am not so sure about that. In a more complicated scenario with multiple over-served and multiple under-served queues, how will the intra-queue preemption policy know that the containers that are already in the {{selectedContainers}} list will be used to fulfill the needs of any specific queue? Please provide your thoughts. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438052#comment-15438052 ] Wangda Tan commented on YARN-4945: -- Thanks [~eepayne] for providing the detail use case requirements. I haven't looked at contents in PDF yet, but for the overall requirements make sense, and they look like the most important use cases for intra-queue preemptions to me. With existing framework added by [~sunilg], we should be able to support different scheduling policies (like fair, fifo, priority, etc.) by adding different preemptable-resource-calculator (decide ideal/preemptable resource for apps), and different preemptable-candidate-selector (decide containers to preempt). > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: Intra-Queue Preemption Use Cases.pdf, > IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432961#comment-15432961 ] Sunil G commented on YARN-4945: --- Thank you [~eepayne] for the detailed explanation. bq.I want to make sure we are talking about the same thing, so I would like to expressly clarify what I mean by user-limit Thanks for the details here. It makes much sense now. With minimum-user-limit-percent, we are just trying to ensure a minimum percentage of resource for all users. I think, we need not having to worry about user level headroom here then. It becomes more clear and simple now. bq.I think it would be helpful to define use cases so that everyone is clear about what problems we are trying to solve. I will make an attempt at that and post a doc here. Perfect. I will soon upload a cleaner version patch with basic UT. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431513#comment-15431513 ] Eric Payne commented on YARN-4945: -- [~sunilg], Thanks a lot for your reply! {quote} - user-limit: may be partial or full pending resource request will become resource to obtain for this app. This is depending on user-limit_headroom - current_used. This much can be considered as demand from this app. {quote} I want to make sure we are talking about the same thing, so I would like to expressly clarify what I mean by {{user-limit}} because I feel that it is ambiguous and may be causing confusion. In the statement above, I think you are referring to {{yarn.scheduler.capacity.root.QUEUE1.user-limit-factor}}, which plays a role in determining each user's headroom in a queue. {{user-limit-factor}} is important to consider when calculating how much of an app's pending resources should be preempted from other apps. Failure to consider this caused us problems and resulted in YARN-3769. However, in the context of intra-queue preemption, {{yarn.scheduler.capacity.root.QUEUE1.minimum-user-limit-percent}} is the property I want to focus on. My goal is to ensure that each queue is evenly divided between the appropriate number of users, as defined by {{minimum-user-limit-percent}}. bq. with this poc, i am coming with framework and priority preemption. Thank you very much for doing that! bq. However for doc, it will be good if we could have it common for priority and user-limit. Agreed. Also, I think it would be helpful to define use cases so that everyone is clear about what problems we are trying to solve. I will make an attempt at that and post a doc here. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429644#comment-15429644 ] Sunil G commented on YARN-4945: --- bq.My assertion is that regardless of what containers are already in the selectedCandidates list, the intra-queue preemption policy would always need to select more. yes, I also meant that same. There are chances that intra-queue preemption logic may select a container whihc is already selected. So we will continue and deduct from intra-queue resourceToObtain and will continue. I added this point to emphasize that intra queue logic will not do anything to already selected container. bq.we may want to consider intra-queue preemption configs for dead zone, natural completion, Make sense. i will add this point bq.Is this step calculating the total of preemptable resources for apps in this queue, per partition? When we consider resource distribution in a queue, there can be resource over subscription consider the fact that there were no demand at that time when these resource were allocated to queue/app. Later at a point , few more apps came in and caused resource distribution variation based on priority or user-limit. In such cases, we will be considering priority and user-limit as separate. - priority : all pending resource requests for this app will become “resource to obtain for this app” - user-limit: may be partial or full pending resource request will become “resource to obtain for this app”. This is depending on “user-limit_headroom - current_used”. This much can be considered as demand from this app. I used pending because of the notion from scheduler. But in preemption world, that will be mapped to resourceTo Obtain. And yes, we consider this resourceToObtain per partition level and all calculations are done as per same. bq.Is this saying that, when marking containers for preemption, if an app is under its user limit percent, its containers will not be marked? I can clarify this. intra-queue preemption will first calculate resourceTOObtain from those apps which are of high priority (user-limit: those apps which are over-subscribing resource which crosses its user-limit-quota at given instance). From these selected apps, we get how much pending is there and thus will contribute as resourceToObtain (user-limit: in this case, we find those apps which are starving and not getting its user-limit-quota). IN these cases, we will come across apps which is already met / more than its user-limit quota (for priority). So these apps will be skipped and it will be attribute to resourceToObtain. bq.Perhaps these should be totally separate policies. My idea is to come with IntraQueue framework and apply policies like priority and user-limit on top of that. So with this poc, i am coming with framework and priority preemption. user-limit can be be added as new policy on top of this framework. And it will be having the points which are mentioned by you. However for doc, it will be goof if we could have it common for priority and user-limit. And we can add the point which you have given in comment to doc. This will give a better insight for intra-queue preemption. Thoughts? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, > YARN-2009-wip.patch > > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428904#comment-15428904 ] Eric Payne commented on YARN-4945: -- [~sunilg], thank you so much for providing this design doc and POC. I have not yet looked at the patch, but I have a few comments on the design doc. - {quote} Additional Requirement specs ... - Over subscribed queue ... -- Selected containers will completely serve resource need from starving apps. ... -- Selected containers only partially serves the need ... By scanning through each partition and its associated queues (TempQueuePerPartition), we can understand how much resources are offered from each queue for preemption and also the selected container list. This can be used as a reference to avoid double calculations in intraqueue preemption round. {quote} I'm pretty sure that the containers already in the {{selectedCandidates}} list will _not_ be re-assigned to anything in the current queue. The containers are in that list because some other queue is asking for them. Even if containers that are already in the inter-queue preemption list would also help resolve an intra-queue preemption problem, those containers will go to the more underserved queue before coming back to the current queue. My assertion is that regardless of what containers are already in the {{selectedCandidates}} list, the intra-queue preemption policy would always need to select more. - {quote} Configurations and considerations - Provide a configuration to turn on/off intraqueue preemption along with the type of policy it is going to handle (priority, fairness, userlimit etc) {quote} Additionally, we may want to consider intra-queue preemption configs for dead zone, natural completion, etc. This may even need to be per queue. - {quote} Select ideal candidates for intraqueue preemption per priority. ... 3. ‘pending’ resource per partition will be calculated for all the apps and together store in a consolidated map (resourceToObtain) of pending resource to be collected per partition in one queue. {quote} The use of the word "pending" in conjunction with the reference to {{resourceToObtain}} is confusing to me. It sounds like "pending" is talking about "preemptable resources," but "pending" means "resources requested but not yet allocated." (See {{LeafQueue#getTotalPendingResourcesConsideringUserLimit}}). For instance, the {{resToObtainByPartition}} variable in {{FifoCandidatesSelector}} is used for holding the amount of extra (and therefore preemptable) resources being used by a queue. Is this step calculating the total of preemptable resources for apps in this queue, per partition? - {quote} 4. While doing this, we will ensure that certains apps will be skipped if it is already equal or more that its userlimit quota. This map will be the entry point to select candidates from lower priority apps in next step. {quote} Is this saying that, when marking containers for preemption, if an app is under its user limit percent, its containers will not be marked? Or, is it saying that if an app is asking for more containers and it is already over its user limit percent, other apps' containers won't be preempted on its behalf? Not only do we need to avoid preemptiong resources _for_ users that are over their user limit percent, we need to avoid preempting containers _from_ users that are under their user limit percent. Even today in the capacity scheuler, if I have a queue with a 50% user limit percent, and app1 from user1 is priority1 and app2 from user2 is priority2, and they are both asking for more resources, user2 will not get more containers until user1 has reached 50% of the queue. In other words, user limit percent trumps application priority. - I am concerned that priority-based intra-queue preemption has a different set of goals than user limit percent-based intra-queue preemption. For instance, - requirements for user limit percent-based preemption are calculated based at the user level, while priority-based preemption requirements go down to the app level. - User limit percent-based preemption only makes sense if multiple users are in a queue, and priority-based preemption only makes sense if a priority inversion can happen between apps of the same user in a queue. Perhaps these should be totally separate policies. Anyway, for us, user limit percent-based preemption is much more important. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, > YARN-2009-wip.patch > > > This is umbrell
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395858#comment-15395858 ] Sunil G commented on YARN-4945: --- Extremely sorry for the delay. bq. With the proposed design, is it possible for an app to be preempted that is below its user limit? Yes, we should consider this case and should not allow preemption. However there could be bounder line scenarios where high priority apps has more demand and we might preempt apps (few containers) which may just over user-limit. Such deadzones will also be considered to avoid over kills. We can see how existing tuning configs can be made use of intra-queue scenarios too > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392303#comment-15392303 ] Eric Payne commented on YARN-4945: -- {quote} This is considering priority of apps alone. Yes, we have to have user-limit etc. As we progress, we can add that to strengthen the intra-queue preemption for more accurate results. {quote} Thanks, [~sunilg]. With the proposed design, is it possible for an app to be preempted that is below its user limit? I think that should not happen even if a higher priority app is asking for resources. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392286#comment-15392286 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne] bq.I would like to separate out the sub-feature pieces of in-queue preemption as much as possible. bq. I think that in-queue preemption based on user limit needs to be very dependent on app priority and vice-versa. Yes. I have almost completed poc code for *ideal resource allocation* within a queue. This is considering priority of apps alone. Yes, we have to have user-limit etc. As we progress, we can add that to strengthen the intra-queue preemption for more accurate results. I could share draft POC patch, but I thought its better if I share the doc first. I will share it in a day. we could have more discussion to finalize the same. POC patch was done to see how well we can reuse and fit OR make use of the changes done by [~leftnoteasy] for a better preemption framework. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392184#comment-15392184 ] Eric Payne commented on YARN-4945: -- Thanks [~sunilg]. I appreciate very much any design help and POC help that you can provide. {quote} queue preemption may need to consider multiple factors such - Priority - Fairness - User limit {quote} In general, I would like to separate out the sub-feature pieces of in-queue preemption as much as possible. I am a fan of making small, simple improvements in increments. I believe this makes it easier to understand, test, and review. bq. For initial POC, I was planning priority as I have done an independent POC for priority preemption alone. One thing I don't understand is the use case for a priority policy that is separate from a user limit policy. For my users' use case, the most important of these is user limit. However, I think that in-queue preemption based on user limit needs to be very dependent on app priority and vice-versa. Can you please elaborate? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391671#comment-15391671 ] Sunil G commented on YARN-4945: --- Thanks [~eepayne] and [~leftnoteasy] Yes, I was working on a POC patch for this and I could share an initial design doc for same. It will be really great if we can collaborate. Since intra-queue preemption may need to consider multiple factors such - Priority - Fairness - User limit my plan was to do the basic intra-queue preemption framework which can work along (or reuse) with existing PCPP mode. To this framework, we can add policies mentioned above. For initial POC, I was planning *priority* as I have done an independent POC for priority preemption alone. It will be great if we could also add policies like *user limit* , *fairness* etc. I will share an initial document in a short while, so we could discuss more and improve the same. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389859#comment-15389859 ] Wangda Tan commented on YARN-4945: -- [~eepayne], I totally agree that intra-queue preemption should happen only if applications belong to the queue cannot get resource outside of the queue. These bounds are: resources / partitions / user-limits. However, I think for non-accountable (like First-come-first-serve) usage, for example locality, should not trigger preemption. Preemption of FCFS usages could lead to excessive-preemption. And [~sunilg] is also working on a POC / design, I think you can collaborate on this. Sunil, could you give some brief updates of the POC/design efforts? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389828#comment-15389828 ] Eric Payne commented on YARN-4945: -- HI [~leftnoteasy]. Sorry for the long delay. I am now starting to have more time to focus on putting a design together and getting a POC working for in-queue preemption. {quote} - When intra-queue preemption can happen: in some cases, we need intra-queue preemption happen when queue is under its guaranteed resource, {quote} Just for clarification, I think that if resources are available in the cluster and a queue can get more resources by growing the queue's usage, then in-queue preemption shouldn't happen. However, if something in the queue's hieararchy has reached its absonlute max capacity or if the the cluster itself is full, then in-queue preemption should happen, even if the queue is under its guaranteed resource max. For {{queue X}}, this should happen when all of the following occur: # some set of resources (memory, vcores, labelled, locality, etc) are all used, either by other queues or apps in {{queue X}} # any user in {{queue X}} is over its minimum user limit percent # another user in {{queue X}} is under its minimum user limit percent and asking for resources Having said that, the question of whether a queue can grow its usage by allocating available resources is complicated by the same issues that plague cross-queue preemption such as labelled resources, locality, fragmented memory, and so forth. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241922#comment-15241922 ] Wangda Tan commented on YARN-4945: -- [~eepayne], Above is one of the case that needs intra-queue preemption kick in. In my mind, intra-queue preemption could happen when the queue is not able to get new resources (1), and any of apps in the queue is under its ideal allocation. (2) (1) The queue is not able to get new resources: Not only queue has 100% resource usage and cluster is full. It is possible that because of queue's max capacity setting, queue cannot get more resource when cluster has sufficient idle resource . And also, when some queue doesn't allow preemption, queue cannot get more resource when the queue below its guaranteed capacity. (2) Any of apps under its ideal allocation: Not only for user-limit example, we may need to consider a general solution for different queue policies. For example: for Fifo+priority policy, highest application can take all capacities; for fair policy, app's ideal allocation is computed fair share. The only difference between different intra-queue preemption goal (children tasks of this umbrella JIRA) is, we need to compute applications' ideal allocations in different way. Remain part should be same. Thoughts? > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241683#comment-15241683 ] Eric Payne commented on YARN-4945: -- Thanks [~leftnoteasy]. I have seen the following use case where intra-queue preemption would be necessary. [~sunilg] / [~leftnoteasy], please provide additional use cases as you think of them. - {{QueueA}} has 100 resources with {{User Limit Factor = 1.0}} and {{User Limit % = 20%}} - The whole cluster is full. - User X's apps launch 100 containers that run for 24 hours - User Y wants to launch an app, but even though User Y should expect at least 20% of {{QueueA}}, they can't get any resources because the cluster is full and User X won't give any up. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue
[ https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236305#comment-15236305 ] Wangda Tan commented on YARN-4945: -- Some rough ideas about design: In general, after YARN-4822, we can implement and active new intra-queue preemption candidate selector according to queue's policy. There're a couple of things that needs to think more: - Compute ideal resource allocation of applications: Currently we only support calculating queue's ideal resource allocation. We should calculate apps' ideal allocation as well. - Configuration of intra-queue preemption: we should be able to turn on/off intra/inter queue preemption separately. - When intra-queue preemption can happen: in some cases, we need intra-queue preemption happen when queue is under its guaranteed resource, and we also need to make sure no excessive preemption (like crossfire between apps) happens. - Priorities of intra-queue preemption and inter-queue preemption. Intra-queue preemption should happen after inter-queue preemption, IAW, in an under-satisfied queue, an app should prefer to get resources from an over-satisfied queue instead of from other apps in the same queue. + [~sunilg], [~eepayne]. > [Umbrella] Capacity Scheduler Preemption Within a queue > --- > > Key: YARN-4945 > URL: https://issues.apache.org/jira/browse/YARN-4945 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan > > This is umbrella ticket to track efforts of preemption within a queue to > support features like: > YARN-2009. YARN-2113. YARN-4781. -- This message was sent by Atlassian JIRA (v6.3.4#6332)