Carlo Curino commented on YARN-5734:

[~mshen], I skimmed your doc, but not read it carefully yet. I am generally a 
fan of this. At MS we have similar mechanisms for other systems and users seem 
to like it, also at our scale the number of daily configuration is substantial 
and constant refresh from XML (could be tens daily) sits between very annoying 
and impractical. Moreover, in Federation YARN-2915 we would be happy to 
leverage this functionality, as we want to centralized the configuration of 
multiple RMs via our centralized FederationPolicyStore, our current practical 
workaround is to automate the download of the new conf, write to .xml file and 

A couple of important considerations:
 # The solution should play nice with HA, so using the RMStateStore (instead or 
beside) Derby for storing the updated configuration (beside the conf.xml as you 
do as a backup) is I think key.
 # As you do this, please make the Store (e.g., DB) configurable. In our 
deployments, it would be very nice to use an external RDBMS. Generally I agree 
with [~cwsteinbach] that having configs stored in a DB is very convenient, as 
you can easily maintain a historical record of previous entries, and study how 
they evolve/relate with each other with simple OLAP queries. 
 # You should also take a look at the ReservationSystem code (YARN-1051, 
YARN-2572, YARN-2573), as the PlanQueue and ReservationQueue are used to very 
dynamically change configurations (focus on capacity/max-capacity only, but we 
could generalize it if useful). 
Bottomline, the specifics of the code might need to go through a few 
iterations/tweaks, but the general idea is very welcome IMHO. Also the fact you 
have large scale, and long experience in deploying and operating this is very 

> OrgQueue for easy CapacityScheduler queue configuration management
> ------------------------------------------------------------------
>                 Key: YARN-5734
>                 URL: https://issues.apache.org/jira/browse/YARN-5734
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Min Shen
>            Assignee: Min Shen
>         Attachments: OrgQueue_Design_v0.pdf
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to