[ 
https://issues.apache.org/jira/browse/YARN-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257773#comment-16257773
 ] 

Jonathan Hung commented on YARN-7004:
-------------------------------------

Hi [~Tao Yang], regarding YARN-5734, as Wangda mentioned there is support for 
storing capacity scheduler configuration in non-file based storage, and using 
an API to make incremental configuration changes to specific key-value pairs. 
Currently there's support for leveldb and zookeeper. The leveldb implementation 
should solve this problem, e.g. if you only want to change one key/value pair 
then the configuration mutation operation only needs to persist this single 
change, and the mutation is also applied in-memory. For the zookeeper based 
approach though, it reads/deserializes the entire configuration, applies the 
change, and serializes/stores it. But this may still speed it up, depending on 
where the bottleneck is for the original file-based approach. We haven't tried 
it on such a large queue hierarchy though.

Anyway, depending on which backing store is suitable for your environment, I'd 
recommend seeing if this feature can fix the refreshQueues issue. There's some 
documentation on enabling this in the markdown files in YARN-7241.

> Add configs cache to optimize refreshQueues performance for large scale of 
> queues
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-7004
>                 URL: https://issues.apache.org/jira/browse/YARN-7004
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.9.0, 3.0.0-alpha4
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>         Attachments: YARN-7004.001.patch
>
>
> We have requirements for large scale queues in our production environment to 
> serve for many projects. So we did some tests for more than 5000 queues and 
> found that refreshQueues process took more than 1 minute. The refreshQueues 
> process costs most of time on iterating over all configurations to get 
> accessible-node-labels and ordering-policy configs for every queue.  
> Loading queue configs from cache should be beneficial to reduce time costs 
> (optimized from 1 minutes to 3 seconds for 5000 queues in our test) when 
> initializing/reinitializing queues. So I propose to load queue configs into 
> cache in CapacityScheduler#initializeQueues and 
> CapacityScheduler#reinitializeQueues. If cache has not be loaded on other 
> scenes, such as in test cases, it still can get queue configs by iterating 
> over all configurations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to