[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217381#comment-16217381 ] Carlo Curino commented on YARN-7117: [~wangda] I agree with your comments. In particular: # I am ok to rename the {{PlanQueue}}/{{ReservationQueue}} to something more generic # I don't have strong preferences regarding {{SchedulingEditPolicy}}, the only caveat is that those are used by {{PlanFollower}} and {{MetricsInvariantChecker}}(s) as well, so make sure those flows are not broken/made complicated. In particular, the configuration should be backward compatible. # I don't follow all the prototype design well, but the {{QueueEntitlementDynamicEditPolicy}} looks very similar in spirit the job done by the {{CapacitySchedulerPlanFollower}} part of the {{ReservationSystem}}, check if you can re-use/extend/refactor that logic. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Suma Shivaprasad > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215881#comment-16215881 ] Wangda Tan commented on YARN-7117: -- Thanks [~suma.shivaprasad] for promptly working on the prototype. Reviewed and had several long offline discussions with Suma, *Some high level comments*: 1) The AutoCreatedParentQueue/LeafQueue are too similar to PlanQueue and ReservationQueue, suggest to merge the two implementations. 2) It is a good idea to keep logics to do capacity management to a separate module (which implements SchedulingEditPolicy). However currently SchedulingEditPolicy needs to be preconfigured and cannot be turned on/off at runtime. We need to improve that part before using SchedulingEditPolicy. 3) There're lots of common logics between ProportionalCapacityPreemptionPolicy and the new added SchedulingEditPolicy, it's better to merge at least clone queues logics / data structures and libraries to a common abstract class. *Regarding to development, I think we can break down this JIRA to following sub tasks:* a. Rename ReservationQueue/PlanQueue to a different name (with as less as possible logic changes), which can be used to implement this feature. (no dependency) b. Implement Queue Mapping / Queue Creation logics which covers: Get applications queue mapping and auto create leaf queues. Reject application if leaf queue creation failed. (depends on a.) c. SchedulingEditPolicy changes to make it refreshable and can be turned on/off while scheduler is running. (Which is tracked by YARN-7370), no dependency. d. Move clone queues method / data structure / libaries from ProportionalCapacityPreemptionPolicy to a common parent class. (no dependency) e. Add SchedulingEditPolicy framework and implementation to adjust capacities and states of sub queues. (depends on a/d) *Regarding to development in branch v.s. development in trunk*: Since a/c/d are all clean up / refactoring tasks not specific to this task, I prefer to do them directly on trunk. And after b, this feature is already end-to-end completed, I prefer to do b/e on trunk as well to avoid overheads of feature branch. [~curino] / [~jlowe] / [~subru], could you add your suggestions to the attached prototype (See outlined workflow: https://issues.apache.org/jira/secure/attachment/12893355/YARN-7117_Workflow.pdf) and plans? > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Suma Shivaprasad > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213448#comment-16213448 ] Suma Shivaprasad commented on YARN-7117: Attached a doc depicting the workflow and classes or Auto queue creation and Capacity Management for these queues > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188674#comment-16188674 ] Jason Lowe commented on YARN-7117: -- bq. The current CS code has a bug in that it allows "." in the queue names. Do you think we should fix this for 3.0? That's unfortunate, but I do not think it has to be a show-stopper. A couple of ways to fix it: 1) Preclude the use of '.' in auto-queue names, then we always know the last word when we split with '.' is the child queue and the rest is the parent queue. or 2) The parsing of the specified queue names becomes trickier. Rather than the code blindly assuming it can split the name on '.' to get queue names, it has to check to see if the parent exists. If it doesn't then it adds the next chunk from the split and see if that's a valid parent queue, etc. There may be some issues with ambiguity, but I suspect there would be other problems with getting queue configs parsed properly if there were truly ambiguous resolutions. bq. This could be done though it might be a backward in-compatible change. Which part has the compatibility concern? There aren't any existing semantics of auto-queues since they doesn't exist yet, so I'm confused where the compatibility issue lies. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186821#comment-16186821 ] Suma Shivaprasad commented on YARN-7117: Thanks for your feedback [~jlowe] the syntax would be more concise and easier to read if the queue could be specified as a sub-path that can optionally include the parent queue. For example, rather than u:user1:queue1(parent-queue=marketing) the syntax could be simplified to: u:user1:marketing.queue1. {quote} Agree .The current CS code has a bug in that it allows "." in the queue names. Do you think we should fix this for 3.0? {quote} I'm not really familiar with queue mappings, but I'm assuming the order they are specified is significant to deterministically resolve cases where more than one specified rule would apply to a user. If so then the example is confusing since it looks like the u:user2:%primary_group(parent-queue=finance) rule will always be eclipsed by the preceding u:%user:%user(parent=engineering) rule. {quote} Good catch. The example needs to be fixed {quote} IMHO if the admin wants to configure guarantees for auto-created queues then we should not assume that they're going to be OK with auto-created queues that do not meet those specifications. Otherwise I'd assume the admin would forgo guaranteed capacities on the auto queues and just have them carve up the parent queue proportionally. {quote} Thats a good point. We could have a configuration that allows admins to have parent queues which allow creation of auto queues with guaranteed resources and fail submissions when parent doesnt have room. By default this could allow best effort queues as in current design and could be controlled by admins when they configure the parent queue. The document implies that there's SLAs with guaranteed capacity auto-queues, but that's clearly not the case. In the example, it's true that the applications submitted to q4 and q5 eventually ran with guaranteed capacities. However they waited an unbounded amount of time to start running which means we cannot always hit SLAs. Users in q1/q2/q3 can collectively deny apps in q4/q5 ever running, for example. {quote} This should be addressed by the above configurable policy. SLA guarantees are better with the admin able to configure parent queues with guaranteed capacities for auto created leaf queues {quote} For the alternative approach where all of the queues are "best effort" we don't have to always have the max-am-resource at 0%. We could specify the max-am as a percent of the max cap for those queues or a separate config specific to them, or whatever. Or we could have the queues auto-distribute the capacities of the parent as new queues are added. In other words the auto-queue capacity is 1/(num auto queues) of the parent and the max-capacity is always 100%. Preemption can be used to keep the queues fair if one user tries to dominate over the others, but capacities of underutilized queues can be leveraged by others. {quote} This could be done though it might be a backward in compatible change. {quote} If a user has ACLs to the parent queue then I believe they have those ACLs to the entire hierarchy of that queue. That means if the parent queue says they can submit then they'll be able to submit to any auto-queue underneath that parent. We'll either need a new ACL for auto-queue creation separate from app submission or change the semantics of ACL inheritance for auto-queues. Probably the former makes more sense and would be more intuitive since admins will be used to the inheritance features of today's queue ACLs and allow admins to configure parent-queue-privileged users that can get admin-like access to all the auto-queues of a parent queue but aren't fully admin users across all queues. {quote} Good point. Having separate ACLs for auto-creation would be better {quote} I don't know if it's critical to show auto-queues as a different color, but I think it would be important to be able to determine somehow via the UI that the queue was auto-created so the admin doesn't wonder why they can't find the queue in the static queue configs. This might be as simple as a "Auto-Queue: true/false" line in the queue details box in the UI. {quote } Agree {quote} > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf > > > Currently Capacity Scheduler doesn't support auto creation
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186396#comment-16186396 ] Jason Lowe commented on YARN-7117: -- Thanks for providing the doc, Wangda! I think the syntax would be more concise and easier to read if the queue could be specified as a sub-path that can optionally include the parent queue. For example, rather than {{u:user1:queue1(parent-queue=marketing)}} the syntax could be simplified to: {{u:user1:marketing.queue1}}. I'm not really familiar with queue mappings, but I'm assuming the order they are specified is significant to deterministically resolve cases where more than one specified rule would apply to a user. If so then the example is confusing since it looks like the {{u:user2:%primary_group(parent-queue=finance)}} rule will always be eclipsed by the preceding {{u:%user:%user(parent=engineering)}} rule. {quote} If we don’t have guaranteed room in the parent queue, queues with 0 capacity (best effort queue ) will be created. Applications running in these best effort queues could be starving if no capacity is available {quote} This conflicts with the proposal above to fail the submission because it cannot create the queue with guarantees. It seems weird to have user A get guaranteed capacity but user B gets _zero_ guarantees because they were just a second later to submit than user A. IMHO if the admin wants to configure guarantees for auto-created queues then we should not assume that they're going to be OK with auto-created queues that do not meet those specifications. Otherwise I'd assume the admin would forgo guaranteed capacities on the auto queues and just have them carve up the parent queue proportionally. The capacity management with guaranteed capacities refers to a "configured-threshold" but the interface to set that threshold is not documented above. The document implies that there's SLAs with guaranteed capacity auto-queues, but that's clearly not the case. In the example, it's true that the applications submitted to q4 and q5 eventually ran with guaranteed capacities. However they waited an unbounded amount of time to start running which means we cannot always hit SLAs. Users in q1/q2/q3 can collectively deny apps in q4/q5 ever running, for example. For the alternative approach where all of the queues are "best effort" we don't have to always have the max-am-resource at 0%. We could specify the max-am as a percent of the max cap for those queues or a separate config specific to them, or whatever. Or we could have the queues auto-distribute the capacities of the parent as new queues are added. In other words the auto-queue capacity is 1/(num auto queues) of the parent and the max-capacity is always 100%. Preemption can be used to keep the queues fair if one user tries to dominate over the others, but capacities of underutilized queues can be leveraged by others. If a user has ACLs to the parent queue then I believe they have those ACLs to the entire hierarchy of that queue. That means if the parent queue says they can submit then they'll be able to submit to any auto-queue underneath that parent. We'll either need a new ACL for auto-queue creation separate from app submission or change the semantics of ACL inheritance for auto-queues. Probably the former makes more sense and would be more intuitive since admins will be used to the inheritance features of today's queue ACLs and allow admins to configure parent-queue-privileged users that can get admin-like access to all the auto-queues of a parent queue but aren't fully admin users across all queues. Yes, if a user does not have the ability to create an auto-queue and/or submit then the submit should fail. I don't know if it's critical to show auto-queues as a different color, but I think it would be important to be able to determine _somehow_ via the UI that the queue was auto-created so the admin doesn't wonder why they can't find the queue in the static queue configs. This might be as simple as a "Auto-Queue: true/false" line in the queue details box in the UI. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158854#comment-16158854 ] Clay B. commented on YARN-7117: --- As to my use-case: I have an ad-hoc "users" queue which is a best-effort queue with some minimum capacity granted to that overall queue. E.g. users can get their 4 vcpus and 4GB of RAM (or more if available), if the users best-effort queue is not over-subscribed then they can pre-empt jobs running outside that queue for their minimum. However, on our Dev systems we may have 1,000+ developers and certainly we don't expect them all to run at once (e.g. typical teleco-style over-subscription assumptions) but we do want to ensure that other queues' guaranteed capacity are not impacted and we want to ensure we have per-user metrics on ad-hoc jobs which individual queues provide us via JMX over having a single ad-hoc queue with YARN per-user metrics overall (where other queues may be prudently used by a user and analysed). Should a user submit when the parent queue is saturated it would be due to a massive influx of users since, for me, one user has a minimum of 1% of the overall queue capacity. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148148#comment-16148148 ] Wangda Tan commented on YARN-7117: -- Thans [~jlowe], make sense. [~curino], thanks for pointing me the ReservationSystem related implementation, I don't want to reinvent the wheel, will definitely investigate how can we reuse the logic. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148040#comment-16148040 ] Carlo Curino commented on YARN-7117: I have very skimmed the thread very briefly, but it appears to me that you are looking at lots of the problems are similar to those solved by the {{PlanQueue}} {{ReservationQueue}} and {{PlanFollower}}. There we dealt with dynamic creation of queues, adding up to 100% as well as we can provide (thanks to the planning aspects) stronger guarantees for the Oozie like use cases (recurring aspects of this in YARN-5326). I would ask you to look at that stuff, and figure out whether you can leverage some/most of it, instead of building a parallel solution. Overall the {{PlanFollower}} design of "observe an external signal and publish it to scheduler" seems clean and powerful, as it decouples the CS inner workings from outside (slower evolving) phenomena. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147241#comment-16147241 ] Jason Lowe commented on YARN-7117: -- bq. This is a valid concern, instead of deleting queue, how about stopping the queue? Probably a bit better depending upon the use case. Still may be surprising to a user who is submitting regularly (e.g.: via Oozie) and suddenly their submissions fail because the queue is stopped due to some other users suddenly submitting some jobs at an inopportune time and forcing this user's queue to stop as a result. I guess that's not a valid use case for this. As long as it's clear how it behaves and that matches expectations of users looking to use this feature I guess it's OK. It may be even better to just leave them as-is and require someone to cleanup unused queues in order to allow more auto-queues, otherwise I'm not sure we could ever deploy this. It would allow one user to remove capabilities previously given to another user which is a denial of service. In that sense it seems better to be up front and either say, "no auto queue for you until someone frees up space" or "hey, here's a queue with little-to-no guarantees but it could have a lot of capacity depending upon other user activity." That latter model (as a single queue for us but could be a bunch of separate, per-user queues if necessary) has worked pretty well for us for ad-hoc jobs. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146216#comment-16146216 ] Wangda Tan commented on YARN-7117: -- Thanks [~jlowe], Regarding to zero capacity queue: Apologize that I didn't make it clear, one of the use case we saw, assume parent's guaranteed resource is not overcommitted, we still want each auto created leaf queue has a minimum guaranteed resource to run their jobs via preemption. Setting guaranteed resource to zero means no SLA for any of auto created queue. I agree that setting capacity to 0 is a better solution if there's no SLA requirement. bq. ... Ripping it out may be tricky depending upon the expectations of the user. This is a valid concern, instead of deleting queue, how about stopping the queue? At least queue is still there. In today's CS, capacities of stopped queue are accounted when we check resource sharing, probably we should exclude shares of stopped queue and fail reactivate (stop->running) when parent queue's guaranteed resource overcommitted. bq. Does the job submission fail since it cannot create the child queue with that guarantee or ..? Yes, this is my original proposal. bq. I don't have all the details on the specific use cases, but this seems like we're going out of our way to essentially emulate what user limits and in-queue preemption can already accomplish when users share the same queue. Actually we thought about this option before, basically they have different use cases: user limit and related preemption, etc. are more appropriate for mixed of batch jobs running in the same queue submitted by different users. That's why we do FIFO order for apps, etc. And we allow overcommit of user limit (no hard limit of #running-users in a queue). Running jobs submitted by different users in different queues can better support long running apps. For example each user allowed to run at least 2 docker containers and do whatever inside the docker container. And queue is more individual operated and has better metrics, UI, etc. exposed to end users. My plan is to modify as less as possible logics inside scheduler to support auto-queue creation. Only need to add logics to support auto-create queues and change queue mapping policy to be able to specify create-new-queue-when-absent flag. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145990#comment-16145990 ] Arun Suresh commented on YARN-7117: --- cc [~subru] and [~curino] > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145976#comment-16145976 ] Jason Lowe commented on YARN-7117: -- bq. Adding queue with zero guaranteed resource is one possible solution, but it may not be good enough for resource sharing since many users don't frequently use cluster. If we get the user limit and user limit factor settings correct for the queue (i.e.: make sure the user limit is 100% and ULF is large enough to completely fill the parent queue) then I think it will work as intended without needing to worry about how many other, unused queues there are. The scheduler will automatically try to balance the load between all the active queues just like it does with the regular queues today. And with the ULF being large enough, the other queues should have no trouble stealing from any underutilized sibling queues. bq. Instead, maybe admin can configure a "delete-policy" which delete queues which are unused for X secs. Thoughts? If we have auto-create then I guess it could make sense to have auto-delete, although it may be confusing to users that were querying their auto-created queue status when suddenly that status request fails due to a non-existent queue error. Ripping it out may be tricky depending upon the expectations of the user. If we choose to provide some non-zero capacity (separate from max-capacity) for auto-created queues, What happens when more users become active than we can provide guarantees for in the parent queue? Does the job submission fail since it cannot create the child queue with that guarantee or ..? I don't have all the details on the specific use cases, but this seems like we're going out of our way to essentially emulate what user limits and in-queue preemption can already accomplish when users share the same queue. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145916#comment-16145916 ] Wangda Tan commented on YARN-7117: -- Thanks [~naganarasimha...@apache.org] for suggestions, all good points, bq. These dynamic queues are created under any queue ? how to map which queue this new dynamic queue is created? This depends on mapping policy and if the parent queue support auto create sub queues. (See my first comment). bq. i was actually thinking whether we can use reservation queues under default queues which is not time bounded but will be alive till the apps are running under it. (though not completely thought about approach) This makes a special queue to support the case, I'm not sure if we want to go the direction or not. I would prefer to make auto created queues same as manual created queues. bq. Queue should be created based on user name user group name on what basis will it be decided? This depends on {{yarn.scheduler.capacity.queue-mappings}} settings, and in addition, after YARN-3635/YARN-6689, pluggable queue policy can be specified as well. bq. For ACL would it be simpler to just have Submit-ACL based on user if the queue is created based on user and user group if created based on group and Admin ACLs inherited from the parent ? This is definitely a good starting point. To better support YARN-3635/YARN-6689, admin may want to have more comprehensive syntaxes to specify ACL. bq. based on existing configuration i don't think we can achieve it (correct me if i am wrong) Yes it will be a new config, actually I'm not sure if it is really need. Probably we can create queue under a parent if the parent has enough un-assigned guaranteed resources. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145850#comment-16145850 ] Naganarasimha G R commented on YARN-7117: - Thanks [~wangda] and [~clayb] for bringing up this requirement and [~jlowe] for sharing your views. I was more or less thinking in the same way as Jason's approach ??queues are created with no guarantees (capacity = 0%, max capacity = 100%)?? which would be simple to visualize it. and then further enhance it based on YARN-5881, Further i had following queries, # These dynamic queues are created under any queue ? how to map which queue this new dynamic queue is created? i was actually thinking whether we can use reservation queues under default queues which is not time bounded but will be alive till the apps are running under it. (though not completely thought about approach) # Queue should be created based on user name user group name on what basis will it be decided? :- i think we need to either support one of it or not sure how to specify whether the new queue created should have either name of the user or the group may be pattern ?. # For ACL would it be simpler to just have *Submit-ACL* based on user if the queue is created based on user and user group if created based on group and *Admin ACLs* inherited from the parent ? # I was little confused with {{Parent queue can be marked to allow auto creation of leaf queues. ... We allow no sub queue specified for such parent queue}}, is it that we support some approach in YARN-5881 to create parent queue without children ? based on existing configuration i don't think we can achieve it (correct me if i am wrong). And as well which version are we targeting this ? > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145746#comment-16145746 ] Wangda Tan commented on YARN-7117: -- Thanks [~asuresh] / [~jlowe], Both of you commented on unused queues. IIRC, in fair scheduler, there's no guaranteed resource concept. minResource affect how queues sharing resource, but it is not a guaranteed value since it allows - {{Σ(child.minResource) > parent.minResource.}} I'm agree what [~jlowe] mentioned: bq. In short, if we're going to allow new, previously illegal configurations for auto-queues then arguably we should be consistent and allow them elsewhere as well. To make behavior of automatically created queue consistent with manual created queues, it might be problematic if we allow overcommit of auto created queue but not allow overcommit of manual queues. Adding queue with zero guaranteed resource is one possible solution, but it may not be good enough for resource sharing since many users don't frequently use cluster. Instead, maybe admin can configure a "delete-policy" which delete queues which are unused for X secs. Thoughts? > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145335#comment-16145335 ] Jason Lowe commented on YARN-7117: -- The main issue I see is that it could end up creating what previously was an invalid queue setup where the sum of child capacities != 100%. For example, if each user gets a queue, each queue is "guaranteed" 10GB, and the parent queue only has 100GB. It's weird when 100 users show up because then we have child queues being configured with capacities far beyond the parent queue's capabilities which is something CapacityScheduler never allowed before. If auto-queues can do this, do we also allow it for manually configured ones? It works well when only 10 users are ever active at once, but what happens when 50 are? What does the admin tell them about their "guarantees" at that point? IMHO if we're going to do this then either there needs to be guarantees and attempts to auto-create a queue beyond the parent's capability fails, or queues are created with no guarantees (capacity = 0%, max capacity = 100%) because that's essentially what we're giving them -- no guarantees but best-effort capacity based on what others are doing. In short, if we're going to allow new, previously illegal configurations for auto-queues then arguably we should be consistent and allow them elsewhere as well. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145144#comment-16145144 ] Arun Suresh commented on YARN-7117: --- The proposal makes sense and it is some that I have heard folks asking for. So, the FairScheduler has allowed creation of dynamic queues for users for a while now. One doesn't really worry about unsued queues here, since the default policy is always 'Fair' it doesn't really matter if the queue is active or not, since its siblings will use up all the resources by default. https://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/ > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144581#comment-16144581 ] Wangda Tan commented on YARN-7117: -- + [~jlowe]/[~asuresh]/[~jhung]/[~Naganarasimha], could you share your thoughts when you get chance? > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144580#comment-16144580 ] Wangda Tan commented on YARN-7117: -- Discussed with [~clayb]/[~sunilg]l/[~vinodkv] offline (Thanks Clay for sharing internal use cases). Here is our initial proposal to get more thoughts: - Parent queue can be marked to allow auto creation of leaf queues. (such as {{prefix..auto-queue-creation.enabled}}, default is off). We allow no sub queue specified for such parent queue. - Minimum resource could be specified for queues which are automatically created. (such as {{prefix..auto-queue-creation.subqueue-minimum-resource}}). After YARN-5881, absolute resources can be specified for auto created queues. - CS treats automatically created queues no different from normal queues, which means scheduler will use existing logic to do preemption / fairness allocation / queue-ordering / user-limit, etc. for auto-created queue. - ACL of created queue should be determined by policy. For example, if we expect create different queue for different user, admin may set {{prefix..auto-queue-creation.admin-acl-policy=user-name-equals-to-queue-name}}. - Auto-create queue flag can be specified in queue-mapping policy, default is off. A related issue (maybe it's better to discuss on a separate JIRA) is: it's possible that queues are created but not actively used, so we could allow guaranteed resources are overcommitted. (For example a parent queue with 100G guaranteed memory, and there're 200 sub queues created under the parent, each queue has 1G guaranteed memory, but most of the sub queues are not being used). To solve the issue, scheduler may need to maintain a list of {{#active-leaf-queues}} under one parent (An active-leaf-queue means a leaf queue has at least one app not in final state). Parent queue's guaranteed resource will be checked and enforced when state of leaf queue's changed to active. Application submission will be rejected if - {{Σ(leafQueue.guaranteed) (leafQueue ∈ \{active-leaf-queues\}) > parent.guaranteed)}}. > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org