[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225631#comment-16225631 ] Subru Krishnan commented on YARN-5734: -- [~jhung] (cc: [~mshen], [~xgong], [~leftnoteasy], [~zhz]), can you update the fix versions and release note in anticipation of 2.9.0 release. Thanks. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf, > YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197792#comment-16197792 ] Hudson commented on YARN-5734: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13057/]) YARN-7251. Misc changes to YARN-5734 (jhung: rev 09c5dfe937f0570cd9494b34d210df2d5f0737a7) * (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesConfigurationMutation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestLeveldbConfigurationStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestMutableCSConfigurationProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf, > YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that a
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142386#comment-16142386 ] Andrew Wang commented on YARN-5734: --- Neato, sorry about the noise. If you think this is getting close to done, might be a good time for a new consolidated patch :) > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142380#comment-16142380 ] Jonathan Hung commented on YARN-5734: - Hi [~andrew.wang], thanks for taking a look. Actually the consolidated patch was a POC, we have since changed the derby implementation to leveldb, so we should not need any dependency changes. The current YARN-5734 branch has the code we want to eventually merge (not including the still-outstanding sub tasks), but there are no dependency changes in any of these (here's the current diff --stat for everything committed so far) {noformat}jhung-mn3:hadoop jhung$ git diff 4249172e1419acdb2b69ae3db43dc59da2aa2e03 --stat hadoop-yarn-project/hadoop-yarn/bin/yarn | 4 + hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd | 5 + hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java | 30 + hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java | 238 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java | 160 ++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java | 57 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java | 85 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/package-info.java | 27 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/YarnWebServiceUtils.java | 14 ++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml | 61 + .../hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java | 31 - .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateVersionIncompatibleException.java | 2 +- .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicy.java | 47 +++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicyFactory.java | 49 +++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/DefaultConfigurationMutationACLPolicy.java | 45 +++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfScheduler.java | 72 ++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfigurationProvider.java | 50 +++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java | 86 +--- .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java | 12 ++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/CSConfigurationProvider.java | 47 +++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/FileBasedCSConfigurationProvider.java | 67 + .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/InMemoryConfigurationStore.java | 119 .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/LeveldbConfigurationStore.java | 361 + .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java | 301 + .../main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/QueueAdminConfigurationMutationACLPolicy.java | 110 +++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStore.java | 204 .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStoreFactory.java | 46 +++ .../src/main/java/org/apache/hado
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142365#comment-16142365 ] Andrew Wang commented on YARN-5734: --- Hi Jonathan, thanks for working on this, I gave the consolidated patch from Jan 20th a quick look, a few comments: Looks like we add a new Derby dependency. Derby has a NOTICE file which we need to fold into ours: http://svn.apache.org/repos/asf/db/derby/code/trunk/NOTICE This is a release blocker, so should be a blocker for merge. I didn't check the current branch for any other new dependencies, but their LICENSE and NOTICE also need to be checked for this. One other little comment, we typically centralize dependency versions in hadoop-project/pom.xml for consistency. Recommend doing this for the Derby version as well. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840432#comment-15840432 ] Wangda Tan commented on YARN-5734: -- Hi [~jhung], bq. With this in mind do you still think AdminService is the right place to put the change configuration functionality? I would still prefer to use AdminService, we can add different logic to check ACLs inside AdminService. It is still better than adding them to ClientRMService. bq. If we make MutableConfigurationManager part of CS only, the ClientRMService/AdminService still needs to access it somehow. I think we can make AdminService to call CS directly (like adding a method to CS like {{updateCSConfig}}), and inside CS we will check and reject the request. Changing the global provide-class looks more risks to me, since all YARN components are depended upon that. It's better to limit logics inside CS. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837017#comment-15837017 ] Jonathan Hung commented on YARN-5734: - [~leftnoteasy] thanks for the review. Regarding 1 and 3, potentially there are queue admins (but not yarn admins) that will change scheduler configuration. In this case the AdminService will not check (yarn admin) acls, it should delegate it to ConfigurationMutationPolicy. With this in mind do you still think AdminService is the right place to put the change configuration functionality? For 3, I will add javadocs and a default implementation of the ConfigurationMutationPolicy (which will just check against queue admin acls). (YARN-5954) Regarding 2, do you mean a separate configuration provider (MutableConfigurationManager) for CS, and {{yarn.resourcemanager.configuration.provider-class}} for everything else? As it is now, I made a mistake in the current patch, we can actually take -Provider out of RMContext, since {{yarn.resourcemanager.configuration.provider-class}} is MutableConfigurationManager, so we can just access it via rmContext.getConfigurationProvider(). If we make MutableConfigurationManager part of CS only, the ClientRMService/AdminService still needs to access it somehow. Also to avoid having to change it in other places, currently MutableConfigurationManager overrides LocalConfigurationProvider, so the getConfigurationInputStream behavior in all other non-CS places should be the same. As long as MutableConfigurationManager does not overwrite this functionality we can load stuff from {{yarn-site}}, etc in the same way. (Also in the future if we add store functionality to other non-CS configurations we can just do this through the configuration provider.) Thoughts on this? > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr.
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832692#comment-15832692 ] Jonathan Hung commented on YARN-5734: - Uploaded an initial patch containing some basic end-to-end functionality. Here are yarn-site.xml configurations to get this working: * {{yarn.scheduler.capacity.config.path}} should be set to a directory inside which the database will be stored. (resource manager user should be able to create subdirectories in here) * {{yarn.scheduler.mutable-queue-config.enabled}} should be {{true}} * {{yarn.resourcemanager.configuration.provider-class}} should be set to {{org.apache.hadoop.yarn.server.resourcemanager.conf.MutableConfigurationManager}} Here's some working examples which can be run in series, assuming a starting configuration of two queues, {{root.default}} (with 100 capacity) and {{root.test}} (with 0 capacity): {noformat}curl -X PUT -H 'Content-Type: application/xml' -d ' root.test state STOPPED maximum-applications 33 ' --negotiate -u : "http://:8088/ws/v1/cluster/conf/scheduler/mutate"{noformat} Sets the {{root.test}} queue's state to STOPPED and its maximum-applications to 33. {noformat}curl -X PUT -H 'Content-Type: application/xml' -d ' root.test ' --negotiate -u : "http://:8088/ws/v1/cluster/conf/scheduler/mutate"{noformat} Removes the {{root.test}} queue (since it is STOPPED, leveraging YARN-5556) {noformat}curl -X PUT -H 'Content-Type: application/xml' -d ' root.test2 maximum-applications 34 ' --negotiate -u : "http://:8088/ws/v1/cluster/conf/scheduler/mutate"{noformat} Adds a {{root.test2}} queue. Also sets its maximum-applications to 34. This is just a first version, so there are some details that are not yet implemented/tested (e.g. specifying a hierarchical conf update). [~xgong] and [~wangda], do you mind taking a look to make sure our ideas/interfaces are in alignment? > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how t
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749878#comment-15749878 ] Jonathan Hung commented on YARN-5734: - Uploaded v2 design doc containing changes based on discussion. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: > OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, > OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736005#comment-15736005 ] Wangda Tan commented on YARN-5734: -- bq. f the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable? If everything works as expected, reinitialize failure will not change queue hierarchy. If there's any cases which makes queue structure still get updated when reinitialize fails. Queue configs could be turned to a limbo state, we need fix such cases separately. bq. I think we will still need some sort of PluggablePolicy,... Make sense bq. Not sure if this is what you meant .. I'm not sure what is the interface design, but I think the logic you described should be roughly same as what in my mind. We can check detailed logic while doing patch review. bq. I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml Instead of specifying ConfigurationProvider, I think it might be easier for end user to specify config like {{...scheduler.dynamic-queue-config.enabled}}. We can use different ConfigurationProvider implementation depends on value of dynamic-config.enabled. bq. Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that? If we allow intialize store-based config based on capacity-scheduler.xml, this is not required. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734123#comment-15734123 ] Jonathan Hung commented on YARN-5734: - Thanks for the detailed points, [~leftnoteasy]. bq. How to handle bad configuration update? The idea of calling scheduler#reinitialize mostly makes sense to me, a couple questions/thoughts: * If the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable? * I think we will still need some sort of PluggablePolicy, but in this case it is just an authorization policy so we can leverage YarnAuthorizationProvider. bq. By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). Not sure if this is what you meant, but we can have MutableConfigurationManager extends ConfigurationProvider? So we would just have MutableConfigurationManager expose the X+1 configuration when validating the configuration, and either un-expose it (if failed to reinitialize) or keep it expose and store in backing store (if reinitialized successfully). bq. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed. I agree. bq. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml. Then we can infer the config source from there. So if the scheduler specific ConfigurationProvider is MutableConfigurationManager, it will use the store. Else, use the file. bq. If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store. Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that? For switching from xml based to store based, I was thinking we could just manually change the scheduler's configuration provider in yarn-site.xml then restart the RM. Otherwise if we allow them to do this via CLI, the yarn-site.xml is not consistent with RM behavior (since yarn-site will still say it is file based but the RM will be store-based). > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks fr
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733723#comment-15733723 ] Wangda Tan commented on YARN-5734: -- [~jhung], Discussed with [~jianhe] for my above point #2 again. Now we think the original proposal from you looks better to handle the case when admin want to switch from XML file based solution to API based solution bq. Initialization will be done by xml even if API-based approach is enabled. Then on crash/restart the config store will be honored. Basically once store is initialized, it will be used as source of truth (and the xml is no longer useful). But I think my points are still valid: bq. In the other hand, store-based solution doesn't need refreshQueue CLI at all, because content in store and memory should be always synced. bq. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed. Please share your thoughts. Thanks, > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733575#comment-15733575 ] Wangda Tan commented on YARN-5734: -- Thanks [~jhung] / [~mshen] / [~zhouyejoe] / [~zhz] for pushing this forward. A couple of questions regarding to design: *1) How to handle bad configuration update?* Existing design is updating config first, and then notify scheduler to do update. But how to avoid update failures? IIUC, PluggablePolicy is added to validate config, but does that mean we have to duplicate some validation logics from scheduler to PluggablePolicy? I have an idea that might simplify the overall process: MutableConfigurationManager always maintain the latest-in-use-config (version=X) a. When queue admin request to update some fields, it merges the latest-in-use-config and new-updated-field to a new configuration proposal (version=X+1). By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). b. Then it calls scheduler.reinitialize(...) API and scheduler uses exactly same logic to validate configuration (including CS#parseQueue, etc.) c. If b succeed, write the ver=X+1 config to state store, and response to client about the operation succeeded. The latest-in-use-config updated to X+1 d. If b failed, it report to client and the new-updated-field will simply discarded. This proposal should still fit existing overall architecture. The good things are, it avoids PluggablePolicy implementation (which may require duplicate queue config validation logic), and it avoids write a bad config to store. *2) I think existing design which support using two sources of configuration at the same time is a little confusing, for example:* - Admin setup a cluster from scratch, RM saves xml file to store, but admin could continue edit the capacity-scheduler.xml on disk and call rmadmin -refreshQueue, what should happen? To me this should not allowed: - Existing -refreshQueue is added because under the configuration-file based solution, content in file and memory could be different, -refreshQueue is a way to sync the two. - In the other hand, store-based solution doesn't need refreshQueue CLI at all, because content in store and memory should be always synced. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed. If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store. Thoughts? > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one impleme
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727137#comment-15727137 ] Jonathan Hung commented on YARN-5734: - Hi [~jianhe], thanks for the feedback. bq. Does add/remove also support a full qualified queue name, not just a hierachical structure ? I think supporting a single full qualified queue name would be handy, especially for CLI add/remove Sure, I think it makes sense to support both. bq. User may need to provide a new queue structure for initialization, then, the xml file will conflict with what's in config store. I don't think I understand this part, can you explain why the user needs to provide a new queue structure? Initialization will be done by xml even if API-based approach is enabled. Then on crash/restart the config store will be honored. Basically once store is initialized, it will be used as source of truth (and the xml is no longer useful). bq. Is the implementation that the caller will block until the update is completed - both in store and memory ? Yes, the plan is to block until the update is completed for both. This is to prevent the scenario where the client sends a configuration change, an event is queued, and the call returns, then RM crashes, at which point the configuration change is lost. bq. IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we could potentially have two RMs writing together in a split-brain situation and cause data inconsistency. Therefore, I think ZKRMStateStore might be a better store option by default, especially because of RM HA. Currently we are not running RM HA. The reason we have Derby as the default is because we currently have it running in production (and we don't have a working implementation which supports RM HA), so for single RM clusters we know it works well. bq. Regarding PluggableConfigurationPolicy for authorization, has the implementation considered using YarnAuthorizationProvider ? Took a look at this. I have a couple comments about it, let me know if it's not what you had in mind. * Right now if I understand correctly it looks like YarnAuthorizationProvider only supports authorization based on queue ACL (submit/administer queue). We would need to extend the implementation to support things like fine-grained acls (e.g. acls by configuration key). In this case we would just extend YarnAuthorizationProvider with something like "SchedulerConfigurationAuthorizationProvider". If this is true, then each component using an authorization provider would need to configure its own implementation, since the SchedulerConfigurationAuthorizationProvider does not apply to all components (and it seems all components use the same provider determined by yarn.authorization-provider). * We will probably still need the new pluggable configuration policy, at least for configuration change validation to make sure the proposed configuration changes make sense. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724356#comment-15724356 ] Jian He commented on YARN-5734: --- [~mshen], [~jhung], [~zhz], very useful feature! thanks for the contribution, Some questions I had about the design: - Does add/remove also support a full qualified queue name, not just a hierachical structure ? I think supporting a single full qualified queue name would be handy, especially for CLI add/remove - IIUC, the xml-file will still be used for initialization on startup, even if the API-based approach is enabled ? Then, if the RM gets restarted, will the RM honor the xml file or the config store for initialization ? I feel both scenarios may be possible: -- If it is a crash-and-restart, probably we should honor the config store. -- If RM is going through a rolling upgrade. User may need to provide a new queue structure for initialization, then, the xml file will conflict with what's in config store. - Is the implementation that the caller will block until the update is completed - both in store and memory ? - IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we could potentially have two RMs writing together in a split-brain situation and cause data inconsistency. Therefore, I think ZKRMStateStore might be a better store option by default, especially because of RM HA. - Regarding PluggableConfigurationPolicy for authorization, has the implementation considered using YarnAuthorizationProvider ? YarnAuthorizationProvider is a interface which can be implemented by other authorization plugin(Apache Ranger). Ranger has a nice web portal where it can define arbitrary authorization policies such as restricting certain user/groups from doing certain operations. It would be useful if it did, as Ranger plugin just needs to implement the necessary interface and get the config authorization for free. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713297#comment-15713297 ] Jonathan Hung commented on YARN-5734: - Thanks [~lewuathe], right now we are working on initial patches, and we will have a better idea of how to split tasks once we have a skeleton of the implementation. Regarding the target branch, we will have an option to use the flat configuration file as it is now, so this shouldn't be incompatible. [~rkanter], thanks for the note. As you mentioned, configuration changes shouldn't be too frequent so we don't anticipate this being an issue but we'll definitely keep it in mind. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713186#comment-15713186 ] Robert Kanter commented on YARN-5734: - Oozie has run into scalability problems with Derby, but I would imagine that Oozie does more frequent reads and writes to Derby than users will be doing with their Configurations, so it probably won't be a problem. Just something to keep in mind. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712113#comment-15712113 ] Kai Sasaki commented on YARN-5734: -- I'm also interested in flexible Queue configuration management because xml-based configuration often be troublesome for us. {quote} We discussed an advanced feature of supporting multi-update transactions. {quote} I sometimes faced in-consistent state of queue while updating queue configuration with xml because updating cannot be done transactionally. We have incomplete queue state in scheduler in this case. {quote} Target branch-2 {quote} Obsolete xml file can be incompatible change, so might it be better to target 3.x later? Or does it mean only adding new backend storage impelemtation? Anyway I want to work on after the sub-tasks are arranged. Thanks! > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706950#comment-15706950 ] Xuan Gong commented on YARN-5734: - [~jhung] bq. Create feature branch I have created a feature branch: YARN-5734 > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688294#comment-15688294 ] Jonathan Hung commented on YARN-5734: - Attached updated doc containing design for scheduler configuration management API and backing store > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, > OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612431#comment-15612431 ] Jonathan Hung commented on YARN-5734: - Here are the notes from yesterday's meetup: Objective: Aligning queue configuration requirements from YARN-5734 and YARN-5724 Attendees: Xuan, Wangda, Vinod, Subru, Zhe, Konstantin, Ye, Min, Jonathan, Erik 10/26/16 2-4pm Meeting minutes * Overall we are in agreement of adding a Mutable API for queue configuration. We discussed many details around APIs and storage implementation. * APIs ** For compatibility we can keep xml-file-based configuration as an option. Subru and Wangda both raised a concern that having 2 sources of truth is hard to maintain; therefore user should choose to use either the xml-file-based configuration approach or the new API-based one. ** Vinod raised a point that besides REST APIs, CLIs are also important. ** We also discussed a tricky case of adding new resources to the entire system. ** We discussed an advanced feature of supporting multi-update transactions. E.g. reducing capacity of queue A and moving the capacity of queue B. ** We discussed how to support bulk updates. ** We discussed how to make the project applicable for both Capacity and Fair schedulers. YARN-2986 should be revisited to provide a common data model for both schedulers. ** We discussed the case of hierarchical queues. * Storage implementation ** Derby embedded database can be used as default underlying storage implementation ** Storage implementation should be configurable, e.g. need to use distributed storage to support HA ** Another option is to use the YARN RM state store. This potentially simplifies how update events are logged (audit logger) and recovered. ** Need to address other issues, such as scheduler-agnostic REST APIs and user-friendly concurrent updates ** Target branch-2 * Action items ** Combine YARN-5724 and YARN-5734 to one umbrella ** Create one unified design doc covering *** Backing store implementations *** Queue state machine *** List of supported APIs ** Create feature branch (and add Min Shen ms...@linkedin.com, Jonathan Hung jyhung2...@gmail.com, Ye Zhou zhouye...@gmail.com as branch committers) ** After feature branch is created, create sub-tasks needed for implementing mutable API configuration provider > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606834#comment-15606834 ] Zhe Zhang commented on YARN-5734: - Since there is some overlap between this JIRA's objectives and those of YARN-5724, we plan to have a meetup to better discuss these 2 projects. Thanks [~wangda] and [~xgong] for proposing this. Please join in-person or remotely if you are interested. *When*: Wednesday 10/26 2~4pm *Where*: LinkedIn HQ, 950 West Maude Avenue, Sunnyvale, CA. (If you do plan to attend in-person, please email z...@apache.org) *Confcall*: https://bluejeans.com/654904000 We will post notes after the meetup. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592596#comment-15592596 ] Jonathan Hung commented on YARN-5734: - [~rémy], glad to hear this is useful for your company. With this enabled, {{refreshQueue}} will no longer use the configuration from {{capacity-scheduler.xml}} as the latest conf, since calling capacity scheduler's reinitialize will load the capacity scheduler configuration from the backing store (e.g. derby database). The intent behind {{reset}} is to clear the configuration from the DB and load it from the xml file. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590910#comment-15590910 ] Rémy SAISSY commented on YARN-5734: --- Hi, thanks for this feature, it answers a pain point we have at Criteo. Does it completely disables the refreshQueue CLI which loads LocalConfigurationProvider content or can the command line will basically perform a call to the /cluster/queue/reset REST Endpoint? > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590001#comment-15590001 ] Jonathan Hung commented on YARN-5734: - I see, that makes sense. The local param changes sounds like something we could leverage. [~zhz] it seems that there are a few things OrgQueue needs to integrate with so I think a feature branch would be useful here. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587034#comment-15587034 ] Carlo Curino commented on YARN-5734: [~jhung] what I was saying is a bit different, but what you mention makes sense. What I was pointing out was that we had a solution to tweak (for {{ReservationQueue}}) some of the key params in a very cheap / dynamic way. As part of YARN-4193 we had in a prototype the support for node-labels and did some further scalability work (lock tweaks in CS) to make it scale to many changes per second (300 queues with many node labels updated every sec). The insight was to do more "surgical" local changes to specific params, instead of large lock-deadly operations like refreshQueues. Said this, I agree that some of the work you guys are doing could be used (if low cost enough) to enforce the {{Plan}}, and generalize what reservations can "set" in the queues. Finally, during our convo with [~mshen] I was pointing out that the {{ReservationSystem}} can be used to provide a time-varying notion of queues (think a daily sin for the queue capacity), which in turns could be used to "multiply" the sellable capacity in the cluster. For example, we could promise highly guaranteed access to the "dev" queue during the day and exclusive access to the "reporting" queue at night (note that this provides much stronger guarantees than over-capacity fair sharing). Integrating this with what you guys have would be neat. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587003#comment-15587003 ] Jonathan Hung commented on YARN-5734: - [~curino], thanks for the comments. For 1 and 2, this is in our plans (to do either internally or e.g. in a feature branch). The Derby based storage is one implementation (and eventually we will implement an RMStateStore version). I took a quick look at some of the ReservationSystem code - my understanding is that the {{PlanQueue}}'s capacity/max-capacity is currently mutable in the same way as {{ParentQueue}} (i.e. via {{refreshQueues}})? The dynamic part is in the {{ReservationQueue}}. So instead of having to {{setEntitlement}} for each child of a {{PlanQueue}}, we can leverage the MutableConfigurationProvider API to change all child queue capacities of a {{PlanQueue}}. Is this what you had in mind? Also changing queue configurations such as user-limit or user-limit-factor of a {{ReservationQueue}} can be done via this API (as can other configurations if they are added to {{ReservationQueue}} in the future). > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586290#comment-15586290 ] Zhe Zhang commented on YARN-5734: - Thanks [~mshen] [~zhouyejoe] [~jhung] for the proposal! Also thanks [~curino] for the very helpful feedback. This is potentially a pretty large change, and I think we should use a feature branch for the development. Please share your opinions on this, thanks. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586165#comment-15586165 ] Carlo Curino commented on YARN-5734: [~mshen], I skimmed your doc, but not read it carefully yet. I am generally a fan of this. At MS we have similar mechanisms for other systems and users seem to like it, also at our scale the number of daily configuration is substantial and constant refresh from XML (could be tens daily) sits between very annoying and impractical. Moreover, in Federation YARN-2915 we would be happy to leverage this functionality, as we want to centralized the configuration of multiple RMs via our centralized FederationPolicyStore, our current practical workaround is to automate the download of the new conf, write to .xml file and refreshqueue. A couple of important considerations: # The solution should play nice with HA, so using the RMStateStore (instead or beside) Derby for storing the updated configuration (beside the conf.xml as you do as a backup) is I think key. # As you do this, please make the Store (e.g., DB) configurable. In our deployments, it would be very nice to use an external RDBMS. Generally I agree with [~cwsteinbach] that having configs stored in a DB is very convenient, as you can easily maintain a historical record of previous entries, and study how they evolve/relate with each other with simple OLAP queries. # You should also take a look at the ReservationSystem code (YARN-1051, YARN-2572, YARN-2573), as the PlanQueue and ReservationQueue are used to very dynamically change configurations (focus on capacity/max-capacity only, but we could generalize it if useful). Bottomline, the specifics of the code might need to go through a few iterations/tweaks, but the general idea is very welcome IMHO. Also the fact you have large scale, and long experience in deploying and operating this is very reassuring. > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache
[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management
[ https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573674#comment-15573674 ] Min Shen commented on YARN-5734: [~curino], [~subru], As discussed offline, could you please provide feedbacks on the design docs we currently have? > OrgQueue for easy CapacityScheduler queue configuration management > -- > > Key: YARN-5734 > URL: https://issues.apache.org/jira/browse/YARN-5734 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Min Shen >Assignee: Min Shen > Attachments: OrgQueue_Design_v0.pdf > > > The current xml based configuration mechanism in CapacityScheduler makes it > very inconvenient to apply any changes to the queue configurations. We saw 2 > main drawbacks in the file based configuration mechanism: > # This makes it very inconvenient to automate queue configuration updates. > For example, in our cluster setup, we leverage the queue mapping feature from > YARN-2411 to route users to their dedicated organization queues. It could be > extremely cumbersome to keep updating the config file to manage the very > dynamic mapping between users to organizations. > # Even a user has the admin permission on one specific queue, that user is > unable to make any queue configuration changes to resize the subqueues, > changing queue ACLs, or creating new queues. All these operations need to be > performed in a centralized manner by the cluster administrators. > With these current limitations, we realized the need of a more flexible > configuration mechanism that allows queue configurations to be stored and > managed more dynamically. We developed the feature internally at LinkedIn > which introduces the concept of MutableConfigurationProvider. What it > essentially does is to provide a set of configuration mutation APIs that > allows queue configurations to be updated externally with a set of REST APIs. > When performing the queue configuration changes, the queue ACLs will be > honored, which means only queue administrators can make configuration changes > to a given queue. MutableConfigurationProvider is implemented as a pluggable > interface, and we have one implementation of this interface which is based on > Derby embedded database. > This feature has been deployed at LinkedIn's Hadoop cluster for a year now, > and have gone through several iterations of gathering feedbacks from users > and improving accordingly. With this feature, cluster administrators are able > to automate lots of thequeue configuration management tasks, such as setting > the queue capacities to adjust cluster resources between queues based on > established resource consumption patterns, or managing updating the user to > queue mappings. We have attached our design documentation with this ticket > and would like to receive feedbacks from the community regarding how to best > integrate it with the latest version of YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org