[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077256#comment-14077256 ]
Carlo Curino commented on YARN-1707: ------------------------------------ Thanks again for the fast and insightful feedback. *Regarding how the patch matches the JIRA:* Our initial implementation was indeed making the changes (i.e., the dynamic behaviors) in ParentQueue and LeafQueue themselves. Previous feedback pushed us to have subclasses to in a sense isolate the changes to dynamic subclasses. I think we can go back to the version modifying directly ParentQueue and LeafQueue if there is consensus. #4 is required because we cannot transactionally “add Q1, resize Q2” so that the invariant “size of children is == 100%” is maintained. As a consequence we must relax the constraints (either in ParentQueue if we remove the hierarchy, or as it is today in PlanQueue). The good news is that the percentages from the configuration are not interpreted as actual percentages, but rather used as relative "weights" (ranking queues in used_resources / guaranteed_resources). This means that even a careless admin will not get resources unused. For example, if we set two queues to 10,40 (i.e., something that doesn't add up to 100), the behavior is equivalent to setting them to 20,80 (as they are used only for relative ranking of siblings). I think this is also ok for hierarchies (worth double checking this part). So all in all we can pull up to {{ParentQueue}} and {{LeafQueue}} all the dynamic behavior if there is consensus that this is the right path. *Regarding move:* 1) Good catch... We will wait for feedback from Jian on this. 2) I think we had that at some point and did not work correctly. We will try again. 3) There are few invariants we do not check. {{MaxApplicationsPerUser}} is one of them, but also how many applications can be active in the target queue, etc... As I was mentioning in my previous comment, this is likely fine for the limited usage we will make of this from {{ReservationSystem}}, but it is worth expand the checks we make (see {{FairScheduler.verifyMoveDoesNotViolateConstraints(..)}}) to expose move to users via CLI. > Making the CapacityScheduler more dynamic > ----------------------------------------- > > Key: YARN-1707 > URL: https://issues.apache.org/jira/browse/YARN-1707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler > Reporter: Carlo Curino > Assignee: Carlo Curino > Labels: capacity-scheduler > Attachments: YARN-1707.patch > > > The CapacityScheduler is a rather static at the moment, and refreshqueue > provides a rather heavy-handed way to reconfigure it. Moving towards > long-running services (tracked in YARN-896) and to enable more advanced > admission control and resource parcelling we need to make the > CapacityScheduler more dynamic. This is instrumental to the umbrella jira > YARN-1051. > Concretely this require the following changes: > * create queues dynamically > * destroy queues dynamically > * dynamically change queue parameters (e.g., capacity) > * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% > instead of ==100% > We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)