[
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146216#comment-16146216
]
Wangda Tan commented on YARN-7117:
----------------------------------
Thanks [~jlowe],
Regarding to zero capacity queue: Apologize that I didn't make it clear, one of
the use case we saw, assume parent's guaranteed resource is not overcommitted,
we still want each auto created leaf queue has a minimum guaranteed resource to
run their jobs via preemption. Setting guaranteed resource to zero means no SLA
for any of auto created queue. I agree that setting capacity to 0 is a better
solution if there's no SLA requirement.
bq. ... Ripping it out may be tricky depending upon the expectations of the
user.
This is a valid concern, instead of deleting queue, how about stopping the
queue? At least queue is still there. In today's CS, capacities of stopped
queue are accounted when we check resource sharing, probably we should exclude
shares of stopped queue and fail reactivate (stop->running) when parent queue's
guaranteed resource overcommitted.
bq. Does the job submission fail since it cannot create the child queue with
that guarantee or ..?
Yes, this is my original proposal.
bq. I don't have all the details on the specific use cases, but this seems like
we're going out of our way to essentially emulate what user limits and in-queue
preemption can already accomplish when users share the same queue.
Actually we thought about this option before, basically they have different use
cases: user limit and related preemption, etc. are more appropriate for mixed
of batch jobs running in the same queue submitted by different users. That's
why we do FIFO order for apps, etc. And we allow overcommit of user limit (no
hard limit of #running-users in a queue).
Running jobs submitted by different users in different queues can better
support long running apps. For example each user allowed to run at least 2
docker containers and do whatever inside the docker container. And queue is
more individual operated and has better metrics, UI, etc. exposed to end users.
My plan is to modify as less as possible logics inside scheduler to support
auto-queue creation. Only need to add logics to support auto-create queues and
change queue mapping policy to be able to specify create-new-queue-when-absent
flag.
> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue
> Mapping
> ----------------------------------------------------------------------------------
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: capacity scheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
>
> Currently Capacity Scheduler doesn't support auto creation of queues when
> doing queue mapping. We saw more and more use cases which has complex queue
> mapping policies configured to handle application to queues mapping.
> The most common use case of CapacityScheduler queue mapping is to create one
> queue for each user/group. However update {{capacity-scheduler.xml}} and
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One
> of the option to solve the problem is automatically create queues when new
> user/group arrives.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]