[
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186396#comment-16186396
]
Jason Lowe commented on YARN-7117:
----------------------------------
Thanks for providing the doc, Wangda!
I think the syntax would be more concise and easier to read if the queue could
be specified as a sub-path that can optionally include the parent queue. For
example, rather than {{u:user1:queue1(parent-queue=marketing)}} the syntax
could be simplified to: {{u:user1:marketing.queue1}}.
I'm not really familiar with queue mappings, but I'm assuming the order they
are specified is significant to deterministically resolve cases where more than
one specified rule would apply to a user. If so then the example is confusing
since it looks like the {{u:user2:%primary_group(parent-queue=finance)}} rule
will always be eclipsed by the preceding {{u:%user:%user(parent=engineering)}}
rule.
{quote}
If we don’t have guaranteed room in the parent
queue, queues with 0 capacity (best effort
queue ) will be created. Applications running in
these best effort queues could be starving if no
capacity is available
{quote}
This conflicts with the proposal above to fail the submission because it cannot
create the queue with guarantees. It seems weird to have user A get guaranteed
capacity but user B gets _zero_ guarantees because they were just a second
later to submit than user A. IMHO if the admin wants to configure guarantees
for auto-created queues then we should not assume that they're going to be OK
with auto-created queues that do not meet those specifications. Otherwise I'd
assume the admin would forgo guaranteed capacities on the auto queues and just
have them carve up the parent queue proportionally.
The capacity management with guaranteed capacities refers to a
"configured-threshold" but the interface to set that threshold is not
documented above.
The document implies that there's SLAs with guaranteed capacity auto-queues,
but that's clearly not the case. In the example, it's true that the
applications submitted to q4 and q5 eventually ran with guaranteed capacities.
However they waited an unbounded amount of time to start running which means we
cannot always hit SLAs. Users in q1/q2/q3 can collectively deny apps in q4/q5
ever running, for example.
For the alternative approach where all of the queues are "best effort" we don't
have to always have the max-am-resource at 0%. We could specify the max-am as
a percent of the max cap for those queues or a separate config specific to
them, or whatever. Or we could have the queues auto-distribute the capacities
of the parent as new queues are added. In other words the auto-queue capacity
is 1/(num auto queues) of the parent and the max-capacity is always 100%.
Preemption can be used to keep the queues fair if one user tries to dominate
over the others, but capacities of underutilized queues can be leveraged by
others.
If a user has ACLs to the parent queue then I believe they have those ACLs to
the entire hierarchy of that queue. That means if the parent queue says they
can submit then they'll be able to submit to any auto-queue underneath that
parent. We'll either need a new ACL for auto-queue creation separate from app
submission or change the semantics of ACL inheritance for auto-queues.
Probably the former makes more sense and would be more intuitive since admins
will be used to the inheritance features of today's queue ACLs and allow admins
to configure parent-queue-privileged users that can get admin-like access to
all the auto-queues of a parent queue but aren't fully admin users across all
queues.
Yes, if a user does not have the ability to create an auto-queue and/or submit
then the submit should fail.
I don't know if it's critical to show auto-queues as a different color, but I
think it would be important to be able to determine _somehow_ via the UI that
the queue was auto-created so the admin doesn't wonder why they can't find the
queue in the static queue configs. This might be as simple as a "Auto-Queue:
true/false" line in the queue details box in the UI.
> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue
> Mapping
> ----------------------------------------------------------------------------------
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: capacity scheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments:
> YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf
>
>
> Currently Capacity Scheduler doesn't support auto creation of queues when
> doing queue mapping. We saw more and more use cases which has complex queue
> mapping policies configured to handle application to queues mapping.
> The most common use case of CapacityScheduler queue mapping is to create one
> queue for each user/group. However update {{capacity-scheduler.xml}} and
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One
> of the option to solve the problem is automatically create queues when new
> user/group arrives.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]