[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077242#comment-14077242
 ] 

Subramaniam Venkatraman Krishnan commented on YARN-1707:
--------------------------------------------------------

[~wangda] Thanks for the very detailed comments. I agree that understanding the 
context is essential & glad to help with that. Overall your understanding is 
spot on, please find answers to your questions below: 

1) Yes, it is possible to have multiple PlanQueues (e.g., if two organization 
want to dynamically allocate their resources, but not share among them). This 
is also good to "try" reservation on a small scale and slowly ramp up at each 
org's pace.
2) The extra confs are needed to automate the initialization of key parameters 
of the dynamic ReservationQueues (without requiring full specification of each 
of those).
3) Correct
4) Correct
5) First: the Plan guarantees that the sum of reservations never exceed 
available resources (replanning if needed to maintain this invariant to handle 
failures). On the other hand, like it happens for normal scheduler we can 
leverage "overcapacity" to guarantee high cluster utilization. More precisely, 
depending on the configuration (or dynamically on whether reservations have 
gang semantics or not) we can allow resources allocated to PlanQueue and 
ReservationQueue to exceed their guaranteed capacity (i.e., set the dynamic 
max-capacity above the guaranteed one). In this case preemption might kick in 
if other apps with more rights on resources have pending askss. Part of the 
changes in YARN-1957 were driven by this.
6) To limit the scope of changed, we agreed to have a follow up JIRA to address 
HA. The intuition we have is that it is sufficient to persist the Plan alone. 
During recovery, the _Plan Follower_ will resync the Plan with the scheduler by 
creating the dynamic queues for currently active reservations. We will be happy 
to have your input when we work on the HA JIRA.

[~curino] will answer your questions specify to this JIRA.

> Making the CapacityScheduler more dynamic
> -----------------------------------------
>
>                 Key: YARN-1707
>                 URL: https://issues.apache.org/jira/browse/YARN-1707
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>              Labels: capacity-scheduler
>         Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to