[ 
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146216#comment-16146216
 ] 

Wangda Tan commented on YARN-7117:
----------------------------------

Thanks [~jlowe], 

Regarding to zero capacity queue: Apologize that I didn't make it clear, one of 
the use case we saw, assume parent's guaranteed resource is not overcommitted, 
we still want each auto created leaf queue has a minimum guaranteed resource to 
run their jobs via preemption. Setting guaranteed resource to zero means no SLA 
for any of auto created queue. I agree that setting capacity to 0 is a better 
solution if there's no SLA requirement.

bq. ... Ripping it out may be tricky depending upon the expectations of the 
user.
This is a valid concern, instead of deleting queue, how about stopping the 
queue? At least queue is still there. In today's CS, capacities of stopped 
queue are accounted when we check resource sharing, probably we should exclude 
shares of stopped queue and fail reactivate (stop->running) when parent queue's 
guaranteed resource overcommitted.

bq. Does the job submission fail since it cannot create the child queue with 
that guarantee or ..?
Yes, this is my original proposal.

bq. I don't have all the details on the specific use cases, but this seems like 
we're going out of our way to essentially emulate what user limits and in-queue 
preemption can already accomplish when users share the same queue.
Actually we thought about this option before, basically they have different use 
cases: user limit and related preemption, etc. are more appropriate for mixed 
of batch jobs running in the same queue submitted by different users. That's 
why we do FIFO order for apps, etc. And we allow overcommit of user limit (no 
hard limit of #running-users in a queue).
Running jobs submitted by different users in different queues can better 
support long running apps. For example each user allowed to run at least 2 
docker containers and do whatever inside the docker container. And queue is 
more individual operated and has better metrics, UI, etc. exposed to end users.
My plan is to modify as less as possible logics inside scheduler to support 
auto-queue creation. Only need to add logics to support auto-create queues and 
change queue mapping policy to be able to specify create-new-queue-when-absent 
flag.

> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue 
> Mapping
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-7117
>                 URL: https://issues.apache.org/jira/browse/YARN-7117
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>
> Currently Capacity Scheduler doesn't support auto creation of queues when 
> doing queue mapping. We saw more and more use cases which has complex queue 
> mapping policies configured to handle application to queues mapping. 
> The most common use case of CapacityScheduler queue mapping is to create one 
> queue for each user/group. However update {{capacity-scheduler.xml}} and 
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One 
> of the option to solve the problem is automatically create queues when new 
> user/group arrives.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to