[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077256#comment-14077256
 ] 

Carlo Curino commented on YARN-1707:
------------------------------------

Thanks again for the fast and insightful feedback. 

*Regarding how the patch matches the JIRA:*
Our initial implementation was indeed making the changes (i.e., the dynamic 
behaviors) in ParentQueue and LeafQueue themselves. Previous feedback pushed us 
to have subclasses to in a sense isolate the changes to dynamic subclasses. I 
think we can go back to the version modifying directly ParentQueue and 
LeafQueue if there is consensus. #4 is required because we cannot 
transactionally “add Q1, resize Q2” so that the invariant “size of children is 
== 100%” is maintained. As a consequence we must relax the constraints (either 
in ParentQueue if we remove the hierarchy, or as it is today in PlanQueue).  
The good news is that the percentages from the configuration are not 
interpreted as actual percentages, but rather used as relative "weights" 
(ranking queues in used_resources / guaranteed_resources). This means that even 
a careless admin will not get resources unused.  For example, if we set two 
queues to 10,40 (i.e., something that doesn't add up to 100), the behavior is 
equivalent to setting them to 20,80 (as they are used only for relative ranking 
of siblings). I think this is also ok for hierarchies (worth double checking 
this part).

So all in all we can pull up to {{ParentQueue}} and {{LeafQueue}} all the 
dynamic behavior if there is consensus that this is the right path.

*Regarding move:*
1) Good catch... We will wait for feedback from Jian on this.
2) I think we had that at some point and did not work correctly. We will try 
again.
3) There are few invariants we do not check. {{MaxApplicationsPerUser}} is one 
of them, but also how many applications can be active in the target queue, 
etc... As I was mentioning in my previous comment, this is likely fine for the 
limited usage we will make of this from {{ReservationSystem}}, but it is worth 
expand the checks we make (see 
{{FairScheduler.verifyMoveDoesNotViolateConstraints(..)}}) to expose move to 
users via CLI.


> Making the CapacityScheduler more dynamic
> -----------------------------------------
>
>                 Key: YARN-1707
>                 URL: https://issues.apache.org/jira/browse/YARN-1707
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>              Labels: capacity-scheduler
>         Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to