[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-10-30 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225631#comment-16225631
 ] 

Subru Krishnan commented on YARN-5734:
--

[~jhung] (cc: [~mshen], [~xgong], [~leftnoteasy], [~zhz]), can you update the 
fix versions and release note in anticipation of 2.9.0 release. Thanks.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf, 
> YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197792#comment-16197792
 ] 

Hudson commented on YARN-5734:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13057/])
YARN-7251. Misc changes to YARN-5734 (jhung: rev 
09c5dfe937f0570cd9494b34d210df2d5f0737a7)
* (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesConfigurationMutation.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java
* (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestLeveldbConfigurationStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestMutableCSConfigurationProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java


> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf, 
> YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-08-25 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142386#comment-16142386
 ] 

Andrew Wang commented on YARN-5734:
---

Neato, sorry about the noise. If you think this is getting close to done, might 
be a good time for a new consolidated patch :)

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-08-25 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142380#comment-16142380
 ] 

Jonathan Hung commented on YARN-5734:
-

Hi [~andrew.wang], thanks for taking a look. Actually the consolidated patch 
was a POC, we have since changed the derby implementation to leveldb, so we 
should not need any dependency changes.

The current YARN-5734 branch has the code we want to eventually merge (not 
including the still-outstanding sub tasks), but there are no dependency changes 
in any of these (here's the current diff --stat for everything committed so 
far) {noformat}jhung-mn3:hadoop jhung$ git diff 
4249172e1419acdb2b69ae3db43dc59da2aa2e03 --stat
 hadoop-yarn-project/hadoop-yarn/bin/yarn   
  |   4 +
 hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd   
  |   5 +
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
 |  30 +
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java
 | 238 
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java
 | 160 ++
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java
  |  57 
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java
  |  85 
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/package-info.java
 |  27 
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/YarnWebServiceUtils.java
 |  14 ++
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
   |  61 +
 
.../hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
 |  31 -
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateVersionIncompatibleException.java
|   2 +-
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicy.java
|  47 +++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicyFactory.java
 |  49 +++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/DefaultConfigurationMutationACLPolicy.java
 |  45 +++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfScheduler.java
  |  72 ++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfigurationProvider.java
  |  50 +++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
|  86 +---
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
   |  12 ++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/CSConfigurationProvider.java
 |  47 +++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/FileBasedCSConfigurationProvider.java
|  67 +
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/InMemoryConfigurationStore.java
  | 119 
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/LeveldbConfigurationStore.java
   | 361 +
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java
  | 301 +
 
.../main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/QueueAdminConfigurationMutationACLPolicy.java
| 110 +++
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStore.java
  | 204 
 
.../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStoreFactory.java
   |  46 +++
 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-08-25 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142365#comment-16142365
 ] 

Andrew Wang commented on YARN-5734:
---

Hi Jonathan, thanks for working on this, I gave the consolidated patch from Jan 
20th a quick look, a few comments:

Looks like we add a new Derby dependency. Derby has a NOTICE file which we need 
to fold into ours:

http://svn.apache.org/repos/asf/db/derby/code/trunk/NOTICE

This is a release blocker, so should be a blocker for merge. I didn't check the 
current branch for any other new dependencies, but their LICENSE and NOTICE 
also need to be checked for this.

One other little comment, we typically centralize dependency versions in 
hadoop-project/pom.xml for consistency. Recommend doing this for the Derby 
version as well.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-01-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840432#comment-15840432
 ] 

Wangda Tan commented on YARN-5734:
--

Hi [~jhung], 

bq. With this in mind do you still think AdminService is the right place to put 
the change configuration functionality?
I would still prefer to use AdminService, we can add different logic to check 
ACLs inside AdminService. It is still better than adding them to 
ClientRMService.

bq.  If we make MutableConfigurationManager part of CS only, the 
ClientRMService/AdminService still needs to access it somehow. 
I think we can make AdminService to call CS directly (like adding a method to 
CS like {{updateCSConfig}}), and inside CS we will check and reject the 
request. Changing the global provide-class looks more risks to me, since all 
YARN components are depended upon that. It's better to limit logics inside CS. 

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-01-24 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837017#comment-15837017
 ] 

Jonathan Hung commented on YARN-5734:
-

[~leftnoteasy] thanks for the review. Regarding 1 and 3, potentially there are 
queue admins (but not yarn admins) that will change scheduler configuration. In 
this case the AdminService will not check (yarn admin) acls, it should delegate 
it to ConfigurationMutationPolicy. With this in mind do you still think 
AdminService is the right place to put the change configuration functionality? 

For 3, I will add javadocs and a default implementation of the 
ConfigurationMutationPolicy (which will just check against queue admin acls). 
(YARN-5954)

Regarding 2, do you mean a separate configuration provider 
(MutableConfigurationManager) for CS, and 
{{yarn.resourcemanager.configuration.provider-class}} for everything else? As 
it is now, I made a mistake in the current patch, we can actually take 
-Provider out of RMContext, since 
{{yarn.resourcemanager.configuration.provider-class}} is 
MutableConfigurationManager, so we can just access it via 
rmContext.getConfigurationProvider(). If we make MutableConfigurationManager 
part of CS only, the ClientRMService/AdminService still needs to access it 
somehow. Also to avoid having to change it in other places, currently 
MutableConfigurationManager overrides LocalConfigurationProvider, so the 
getConfigurationInputStream behavior in all other non-CS places should be the 
same. As long as MutableConfigurationManager does not overwrite this 
functionality we can load stuff from {{yarn-site}}, etc in the same way. (Also 
in the future if we add store functionality to other non-CS configurations we 
can just do this through the configuration provider.) Thoughts on this?

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2017-01-20 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832692#comment-15832692
 ] 

Jonathan Hung commented on YARN-5734:
-

Uploaded an initial patch containing some basic end-to-end functionality.
Here are yarn-site.xml configurations to get this working:
* {{yarn.scheduler.capacity.config.path}} should be set to a directory inside 
which the database will be stored. (resource manager user should be able to 
create subdirectories in here)
* {{yarn.scheduler.mutable-queue-config.enabled}} should be {{true}}
* {{yarn.resourcemanager.configuration.provider-class}} should be set to 
{{org.apache.hadoop.yarn.server.resourcemanager.conf.MutableConfigurationManager}}

Here's some working examples which can be run in series, assuming a starting 
configuration of two queues, {{root.default}} (with 100 capacity) and 
{{root.test}} (with 0 capacity):
{noformat}curl -X PUT -H 'Content-Type: application/xml' -d '
  
root.test

  
state
STOPPED
  
  
maximum-applications
33
  

  
' --negotiate -u : 
"http://:8088/ws/v1/cluster/conf/scheduler/mutate"{noformat}
Sets the {{root.test}} queue's state to STOPPED and its maximum-applications to 
33.

{noformat}curl -X PUT -H 'Content-Type: application/xml' -d '
  
root.test
  
' --negotiate -u : 
"http://:8088/ws/v1/cluster/conf/scheduler/mutate"{noformat}
Removes the {{root.test}} queue (since it is STOPPED, leveraging YARN-5556)

{noformat}curl -X PUT -H 'Content-Type: application/xml' -d '
  
root.test2

  
maximum-applications
34
  

  
' --negotiate -u : 
"http://:8088/ws/v1/cluster/conf/scheduler/mutate"{noformat}
Adds a {{root.test2}} queue. Also sets its maximum-applications to 34.

This is just a first version, so there are some details that are not yet 
implemented/tested (e.g. specifying a hierarchical conf update). [~xgong] and 
[~wangda], do you mind taking a look to make sure our ideas/interfaces are in 
alignment?

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_Design_v0.pdf, YARN-5734-YARN-5734.001.patch
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-14 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749878#comment-15749878
 ] 

Jonathan Hung commented on YARN-5734:
-

Uploaded v2 design doc containing changes based on discussion.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: 
> OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf, 
> OrgQueue_API-Based_Config_Management_v1.pdf, OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736005#comment-15736005
 ] 

Wangda Tan commented on YARN-5734:
--

bq. f the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we 
will need to call scheduler.reinitialize(X). In this case we need to call 
reinitialize twice. Is this acceptable? 
If everything works as expected, reinitialize failure will not change queue 
hierarchy. If there's any cases which makes queue structure still get updated 
when reinitialize fails. Queue configs could be turned to a limbo state, we 
need fix such cases separately. 

bq. I think we will still need some sort of PluggablePolicy,... 
Make sense

bq. Not sure if this is what you meant ..
I'm not sure what is the interface design, but I think the logic you described 
should be roughly same as what in my mind. We can check detailed logic while 
doing patch review.

bq. I am thinking we can add a scheduler specific ConfigurationProvider option 
in yarn-site.xml
Instead of specifying ConfigurationProvider, I think it might be easier for end 
user to specify config like {{...scheduler.dynamic-queue-config.enabled}}. We 
can use different ConfigurationProvider implementation depends on value of 
dynamic-config.enabled.

bq. Not sure what you mean by loading configuration file from xml while setting 
the cluster, can you elaborate on that? Do you mean if store is enabled and the 
admin wants to wipe it and load a new conf from a file into the store? Do we 
plan on supporting that?
If we allow intialize store-based config based on capacity-scheduler.xml, this 
is not required.




> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-08 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734123#comment-15734123
 ] 

Jonathan Hung commented on YARN-5734:
-

Thanks for the detailed points, [~leftnoteasy]. 
bq. How to handle bad configuration update?
The idea of calling scheduler#reinitialize mostly makes sense to me, a couple 
questions/thoughts:
* If the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we 
will need to call scheduler.reinitialize(X). In this case we need to call 
reinitialize twice. Is this acceptable?
* I think we will still need some sort of PluggablePolicy, but in this case it 
is just an authorization policy so we can leverage YarnAuthorizationProvider.
bq. By using ConfigurationProvider, it can either get a new 
CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). 
Not sure if this is what you meant, but we can have MutableConfigurationManager 
extends ConfigurationProvider? So we would just have 
MutableConfigurationManager expose the X+1 configuration when validating the 
configuration, and either un-expose it (if failed to reinitialize) or keep it 
expose and store in backing store (if reinitialized successfully). 
bq. If file-based solution is specified, no dynamic update queue operation will 
be allowed. If store-based solution is specified, no refreshQueue CLI will be 
allowed.
I agree.
bq. So I would prefer to add an option to yarn-site.xml to explicitly specify 
which config source the scheduler will use.
I am thinking we can add a scheduler specific ConfigurationProvider option in 
yarn-site.xml. Then we can infer the config source from there. So if the 
scheduler specific ConfigurationProvider is MutableConfigurationManager, it 
will use the store. Else, use the file.
bq. If admin want to load configuration file from xml while setting the 
cluster, or want to switch from xml-file based config to store-based config, we 
can provide a CLI to load a XML file and save it to store.
Not sure what you mean by loading configuration file from xml while setting the 
cluster, can you elaborate on that? Do you mean if store is enabled and the 
admin wants to wipe it and load a new conf from a file into the store? Do we 
plan on supporting that?
For switching from xml based to store based, I was thinking we could just 
manually change the scheduler's configuration provider in yarn-site.xml then 
restart the RM. Otherwise if we allow them to do this via CLI, the 
yarn-site.xml is not consistent with RM behavior (since yarn-site will still 
say it is file based but the RM will be store-based).

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733723#comment-15733723
 ] 

Wangda Tan commented on YARN-5734:
--

[~jhung],

Discussed with [~jianhe] for my above point #2 again.

Now we think the original proposal from you looks better to handle the case 
when admin want to switch from XML file based solution to API based solution

bq. Initialization will be done by xml even if API-based approach is enabled. 
Then on crash/restart the config store will be honored. Basically once store is 
initialized, it will be used as source of truth (and the xml is no longer 
useful).

But I think my points are still valid:

bq. In the other hand, store-based solution doesn't need refreshQueue CLI at 
all, because content in store and memory should be always synced.
bq. So I would prefer to add an option to yarn-site.xml to explicitly specify 
which config source the scheduler will use. If file-based solution is 
specified, no dynamic update queue operation will be allowed. If store-based 
solution is specified, no refreshQueue CLI will be allowed.

Please share your thoughts.

Thanks,


> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733575#comment-15733575
 ] 

Wangda Tan commented on YARN-5734:
--

Thanks [~jhung] / [~mshen] / [~zhouyejoe] / [~zhz] for pushing this forward.

A couple of questions regarding to design: 

*1) How to handle bad configuration update?*

Existing design is updating config first, and then notify scheduler to do 
update. But how to avoid update failures? IIUC, PluggablePolicy is added to 
validate config, but does that mean we have to duplicate some validation logics 
from scheduler to PluggablePolicy?

I have an idea that might simplify the overall process:

MutableConfigurationManager always maintain the latest-in-use-config 
(version=X) 

a. When queue admin request to update some fields, it merges the 
latest-in-use-config and new-updated-field to a new configuration proposal 
(version=X+1). By using ConfigurationProvider, it can either get a new 
CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). 
b. Then it calls scheduler.reinitialize(...) API and scheduler uses exactly 
same logic to validate configuration (including CS#parseQueue, etc.) 
c. If b succeed, write the ver=X+1 config to state store, and response to 
client about the operation succeeded. The latest-in-use-config updated to X+1
d. If b failed, it report to client and the new-updated-field will simply 
discarded.

This proposal should still fit existing overall architecture. The good things 
are, it avoids PluggablePolicy implementation (which may require duplicate 
queue config validation logic), and it avoids write a bad config to store.

*2) I think existing design which support using two sources of configuration at 
the same time is a little confusing, for example:*
- Admin setup a cluster from scratch, RM saves xml file to store, but admin 
could continue edit the capacity-scheduler.xml on disk and call rmadmin 
-refreshQueue, what should happen? 

To me this should not allowed: 
- Existing -refreshQueue is added because under the configuration-file based 
solution, content in file and memory could be different, -refreshQueue is a way 
to sync the two.
- In the other hand, store-based solution doesn't need refreshQueue CLI at all, 
because content in store and memory should be always synced.

So I would prefer to add an option to yarn-site.xml to explicitly specify which 
config source the scheduler will use. If file-based solution is specified, no 
dynamic update queue operation will be allowed. If store-based solution is 
specified, no refreshQueue CLI will be allowed.

If admin want to load configuration file from xml while setting the cluster, or 
want to switch from xml-file based config to store-based config, we can provide 
a CLI to load a XML file and save it to store.

Thoughts?

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-06 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727137#comment-15727137
 ] 

Jonathan Hung commented on YARN-5734:
-

Hi [~jianhe], thanks for the feedback.
bq. Does add/remove also support a full qualified queue name, not just a 
hierachical structure ? I think supporting a single full qualified queue name 
would be handy, especially for CLI add/remove
Sure, I think it makes sense to support both.
bq. User may need to provide a new queue structure for initialization, then, 
the xml file will conflict with what's in config store.
I don't think I understand this part, can you explain why the user needs to 
provide a new queue structure?
Initialization will be done by xml even if API-based approach is enabled. Then 
on crash/restart the config store will be honored. Basically once store is 
initialized, it will be used as source of truth (and the xml is no longer 
useful).
bq. Is the implementation that the caller will block until the update is 
completed - both in store and memory ?
Yes, the plan is to block until the update is completed for both. This is to 
prevent the scenario where the client sends a configuration change, an event is 
queued, and the call returns, then RM crashes, at which point the configuration 
change is lost.
bq. IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run 
RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we 
could potentially have two RMs writing together in a split-brain situation and 
cause data inconsistency. Therefore, I think ZKRMStateStore might be a better 
store option by default, especially because of RM HA.
Currently we are not running RM HA. The reason we have Derby as the default is 
because we currently have it running in production (and we don't have a working 
implementation which supports RM HA), so for single RM clusters we know it 
works well.
bq. Regarding PluggableConfigurationPolicy for authorization, has the 
implementation considered using YarnAuthorizationProvider ?
Took a look at this. I have a couple comments about it, let me know if it's not 
what you had in mind.
* Right now if I understand correctly it looks like YarnAuthorizationProvider 
only supports authorization based on queue ACL (submit/administer queue). We 
would need to extend the implementation to support things like fine-grained 
acls (e.g. acls by configuration key). In this case we would just extend 
YarnAuthorizationProvider with something like 
"SchedulerConfigurationAuthorizationProvider". If this is true, then each 
component using an authorization provider would need to configure its own 
implementation, since the SchedulerConfigurationAuthorizationProvider does not 
apply to all components (and it seems all components use the same provider 
determined by yarn.authorization-provider).
* We will probably still need the new pluggable configuration policy, at least 
for configuration change validation to make sure the proposed configuration 
changes make sense.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724356#comment-15724356
 ] 

Jian He commented on YARN-5734:
---

[~mshen], [~jhung], [~zhz], very useful feature! thanks for the contribution, 
Some questions I had about the design:
- Does add/remove also support a full qualified queue name, not just a 
hierachical structure ? I think supporting a single full qualified queue name 
would be handy, especially for CLI add/remove
- IIUC, the xml-file will still be used for initialization on startup, even if 
the API-based approach is enabled ? Then, if the RM gets restarted, will the RM 
honor the xml file or the config store for initialization ? I feel both 
scenarios may be possible:
-- If it is a crash-and-restart, probably we should honor the config 
store.
-- If RM is going through a rolling upgrade. User may need to provide a 
new queue structure for initialization, then, the xml file will conflict with 
what's in config store.
- Is the implementation that the caller will block until the update is 
completed - both in store and memory ? 
- IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM 
HA in your cluster? Also, I guess Derby does not support fencing ? If so, we 
could potentially have two RMs writing together in a split-brain situation and 
cause data inconsistency. Therefore, I think ZKRMStateStore might be a better 
store option by default, especially because of RM HA. 
- Regarding PluggableConfigurationPolicy for authorization, has the 
implementation considered using YarnAuthorizationProvider ? 
YarnAuthorizationProvider is a interface which can be implemented by other 
authorization plugin(Apache Ranger). Ranger has a nice web portal where it can 
define arbitrary authorization policies such as restricting certain user/groups 
from doing certain operations. It would be useful if it did, as Ranger plugin 
just needs to implement the necessary interface and get the config 
authorization for free.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-01 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713297#comment-15713297
 ] 

Jonathan Hung commented on YARN-5734:
-

Thanks [~lewuathe], right now we are working on initial patches, and we will 
have a better idea of how to split tasks once we have a skeleton of the 
implementation. Regarding the target branch, we will have an option to use the 
flat configuration file as it is now, so this shouldn't be incompatible.

[~rkanter], thanks for the note. As you mentioned, configuration changes 
shouldn't be too frequent so we don't anticipate this being an issue but we'll 
definitely keep it in mind.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-01 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713186#comment-15713186
 ] 

Robert Kanter commented on YARN-5734:
-

Oozie has run into scalability problems with Derby, but I would imagine that 
Oozie does more frequent reads and writes to Derby than users will be doing 
with their Configurations, so it probably won't be a problem.  Just something 
to keep in mind.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-12-01 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712113#comment-15712113
 ] 

Kai Sasaki commented on YARN-5734:
--

I'm also interested in flexible Queue configuration management because 
xml-based configuration often be troublesome for us.

{quote}
We discussed an advanced feature of supporting multi-update transactions.
{quote}

I sometimes faced in-consistent state of queue while updating queue 
configuration with xml because updating cannot be done transactionally. We have 
incomplete queue state in scheduler in this case.

{quote}
Target branch-2
{quote}

Obsolete xml file can be incompatible change, so might it be better to target 
3.x later? Or does it mean only adding new backend storage impelemtation?

Anyway I want to work on after the sub-tasks are arranged. Thanks!

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-11-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706950#comment-15706950
 ] 

Xuan Gong commented on YARN-5734:
-

[~jhung]
bq. Create feature branch

I have created a feature branch: YARN-5734

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_API-Based_Config_Management_v1.pdf, 
> OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-27 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612431#comment-15612431
 ] 

Jonathan Hung commented on YARN-5734:
-

Here are the notes from yesterday's meetup: 

Objective: Aligning queue configuration requirements from YARN-5734 and 
YARN-5724
Attendees: Xuan, Wangda, Vinod, Subru, Zhe, Konstantin, Ye, Min, Jonathan, Erik
10/26/16 2-4pm

Meeting minutes
* Overall we are in agreement of adding a Mutable API for queue configuration. 
We discussed many details around APIs and storage implementation.
* APIs
** For compatibility we can keep xml-file-based configuration as an option. 
Subru and Wangda both raised a concern that having 2 sources of truth is hard 
to maintain; therefore user should choose to use either the xml-file-based 
configuration approach or the new API-based one.
** Vinod raised a point that besides REST APIs, CLIs are also important.
** We also discussed a tricky case of adding new resources to the entire system.
** We discussed an advanced feature of supporting multi-update transactions. 
E.g. reducing capacity of queue A and moving the capacity of queue B.
** We discussed how to support bulk updates.
** We discussed how to make the project applicable for both Capacity and Fair 
schedulers. YARN-2986 should be revisited to provide a common data model for 
both schedulers.
** We discussed the case of hierarchical queues.
* Storage implementation
** Derby embedded database can be used as default underlying storage 
implementation
** Storage implementation should be configurable, e.g. need to use distributed 
storage to support HA
** Another option is to use the YARN RM state store. This potentially 
simplifies how update events are logged (audit logger) and recovered.
** Need to address other issues, such as scheduler-agnostic REST APIs and 
user-friendly concurrent updates
** Target branch-2
* Action items
** Combine YARN-5724 and YARN-5734 to one umbrella
** Create one unified design doc covering
*** Backing store implementations
*** Queue state machine
*** List of supported APIs
** Create feature branch (and add Min Shen ms...@linkedin.com, Jonathan Hung 
jyhung2...@gmail.com, Ye Zhou zhouye...@gmail.com as branch committers)
** After feature branch is created, create sub-tasks needed for implementing 
mutable API configuration provider


> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-25 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606834#comment-15606834
 ] 

Zhe Zhang commented on YARN-5734:
-

Since there is some overlap between this JIRA's objectives and those of 
YARN-5724, we plan to have a meetup to better discuss these 2 projects. Thanks 
[~wangda] and [~xgong] for proposing this. Please join in-person or remotely if 
you are interested.

*When*: Wednesday 10/26 2~4pm
*Where*: LinkedIn HQ, 950 West Maude Avenue, Sunnyvale, CA. (If you do plan to 
attend in-person, please email z...@apache.org)
*Confcall*: https://bluejeans.com/654904000 

We will post notes after the meetup.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-20 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592596#comment-15592596
 ] 

Jonathan Hung commented on YARN-5734:
-

[~rémy], glad to hear this is useful for your company. With this enabled, 
{{refreshQueue}} will no longer use the configuration from 
{{capacity-scheduler.xml}} as the latest conf, since calling capacity 
scheduler's reinitialize will load the capacity scheduler configuration from 
the backing store (e.g. derby database). The intent behind {{reset}} is to 
clear the configuration from the DB and load it from the xml file.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590910#comment-15590910
 ] 

Rémy SAISSY commented on YARN-5734:
---

Hi,
thanks for this feature, it answers a pain point we have at Criteo.

Does it completely disables the refreshQueue CLI which loads 
LocalConfigurationProvider content or can the command line will basically 
perform a call to the /cluster/queue/reset REST Endpoint?


> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-19 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590001#comment-15590001
 ] 

Jonathan Hung commented on YARN-5734:
-

I see, that makes sense. The local param changes sounds like something we could 
leverage.

[~zhz] it seems that there are a few things OrgQueue needs to integrate with so 
I think a feature branch would be useful here.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-18 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587034#comment-15587034
 ] 

Carlo Curino commented on YARN-5734:


[~jhung] what I was saying is a bit different, but what you mention makes 
sense. 

What I was pointing out was that we had a solution to tweak (for 
{{ReservationQueue}}) some of the key params in a very cheap / dynamic way. As 
part of YARN-4193 we had in a prototype the support for node-labels and did 
some further scalability work (lock tweaks in CS) to make it scale to many 
changes per second (300 queues with many node labels updated every sec). The 
insight was to do more "surgical" local changes to specific params, instead of 
large lock-deadly operations like refreshQueues. 

Said this, I agree that some of the work you guys are doing could be used (if 
low cost enough) to enforce the {{Plan}}, and generalize what reservations can 
"set" in the queues. 

Finally, during our convo with [~mshen] I was pointing out that the 
{{ReservationSystem}} can be used to provide a time-varying notion of queues 
(think a daily sin for the queue capacity), which in turns could be used to 
"multiply" the sellable capacity in the cluster. For example, we could promise 
highly guaranteed access to the "dev" queue during the day and exclusive access 
to the "reporting" queue at night (note that this provides much stronger 
guarantees than over-capacity fair sharing).  Integrating this with what you 
guys have would be neat. 

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-18 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587003#comment-15587003
 ] 

Jonathan Hung commented on YARN-5734:
-

[~curino], thanks for the comments.

For 1 and 2, this is in our plans (to do either internally or e.g. in a feature 
branch). The Derby based storage is one implementation (and eventually we will 
implement an RMStateStore version). 

I took a quick look at some of the ReservationSystem code - my understanding is 
that the {{PlanQueue}}'s capacity/max-capacity is currently mutable in the same 
way as {{ParentQueue}} (i.e. via {{refreshQueues}})? The dynamic part is in the 
{{ReservationQueue}}. So instead of having to {{setEntitlement}} for each child 
of a {{PlanQueue}}, we can leverage the MutableConfigurationProvider API to 
change all child queue capacities of a {{PlanQueue}}. Is this what you had in 
mind? Also changing queue configurations such as user-limit or 
user-limit-factor of a {{ReservationQueue}} can be done via this API (as can 
other configurations if they are added to {{ReservationQueue}} in the future). 

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-18 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586290#comment-15586290
 ] 

Zhe Zhang commented on YARN-5734:
-

Thanks [~mshen] [~zhouyejoe] [~jhung] for the proposal! Also thanks [~curino] 
for the very helpful feedback.

This is potentially a pretty large change, and I think we should use a feature 
branch for the development. Please share your opinions on this, thanks.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-18 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586165#comment-15586165
 ] 

Carlo Curino commented on YARN-5734:


[~mshen], I skimmed your doc, but not read it carefully yet. I am generally a 
fan of this. At MS we have similar mechanisms for other systems and users seem 
to like it, also at our scale the number of daily configuration is substantial 
and constant refresh from XML (could be tens daily) sits between very annoying 
and impractical. Moreover, in Federation YARN-2915 we would be happy to 
leverage this functionality, as we want to centralized the configuration of 
multiple RMs via our centralized FederationPolicyStore, our current practical 
workaround is to automate the download of the new conf, write to .xml file and 
refreshqueue.  

A couple of important considerations:
 # The solution should play nice with HA, so using the RMStateStore (instead or 
beside) Derby for storing the updated configuration (beside the conf.xml as you 
do as a backup) is I think key.
 # As you do this, please make the Store (e.g., DB) configurable. In our 
deployments, it would be very nice to use an external RDBMS. Generally I agree 
with [~cwsteinbach] that having configs stored in a DB is very convenient, as 
you can easily maintain a historical record of previous entries, and study how 
they evolve/relate with each other with simple OLAP queries. 
 # You should also take a look at the ReservationSystem code (YARN-1051, 
YARN-2572, YARN-2573), as the PlanQueue and ReservationQueue are used to very 
dynamically change configurations (focus on capacity/max-capacity only, but we 
could generalize it if useful). 
 
Bottomline, the specifics of the code might need to go through a few 
iterations/tweaks, but the general idea is very welcome IMHO. Also the fact you 
have large scale, and long experience in deploying and operating this is very 
reassuring.


> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-13 Thread Min Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573674#comment-15573674
 ] 

Min Shen commented on YARN-5734:


[~curino], [~subru],

As discussed offline, could you please provide feedbacks on the design docs we 
currently have?

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org