[
https://issues.apache.org/jira/browse/YARN-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Yang updated YARN-7005:
---------------------------
Attachment: YARN-7005.003.patch
Attaching v3 patch. [~leftnoteasy], [~sunilg], please help to review in your
free time, Thanks.
Updates:
* maintain demand queues for every parent queue to improve the scheduling
performance: (1) Update demand queues (just add) for parent queues when app
request is updated in CapacityScheduler#allocate. (2) Update scheduling queues
cache and remove non-pending demand queues when demand queues updated (size of
scheduling queues cache not equal with size of demand queues) in
PriorityUtilizationQueueOrderingPolicy#getAssignmentIterator.
* use getAllPending to filter scheduling queues, because nodes in non-exclusive
partition can allocate resource for requests of default partition.
* fix problems of failed test cases
The cost time of scheduling will not grow linearly through this improvement,
performance enhancements are 110% for 500 queues, 230% for 1000 queues and over
1000% for 5000 queues.
Testing result:
{noformat}
Before:
#QueueSize = 5000, testing times : 1000, total cost : 7353788602 ns, average
cost : 7353788.5 ns.
#QueueSize = 5000, testing times : 1000, total cost : 7677551118 ns, average
cost : 7677551.0 ns.
#QueueSize = 1000, testing times : 1000, total cost : 1873387351 ns, average
cost : 1873387.4 ns.
#QueueSize = 1000, testing times : 1000, total cost : 1858447758 ns, average
cost : 1858447.8 ns.
#QueueSize = 500, testing times : 1000, total cost : 1165215528 ns, average
cost : 1165215.5 ns.
#QueueSize = 500, testing times : 1000, total cost : 1188830091 ns, average
cost : 1188830.1 ns.
#QueueSize = 100, testing times : 1000, total cost : 591136755 ns, average cost
: 591136.75 ns.
#QueueSize = 100, testing times : 1000, total cost : 582527533 ns, average cost
: 582527.56 ns.
After:
#QueueSize = 5000, testing times : 1000, total cost time : 631647431 ns,
average cost time : 631647.44 ns.
#QueueSize = 1000, testing times : 1000, total cost time : 548629986 ns,
average cost time : 548630.0 ns.
#QueueSize = 500, testing times : 1000, total cost time : 565621632 ns, average
cost time : 565621.6 ns.
#QueueSize = 100, testing times : 1000, total cost time : 497367467 ns, average
cost time : 497367.47 ns.
{noformat}
> Skip unnecessary sorting and iterating process for child queues without
> pending resource to optimize schedule performance
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-7005
> URL: https://issues.apache.org/jira/browse/YARN-7005
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.9.0, 3.0.0-alpha4
> Reporter: Tao Yang
> Attachments: YARN-7005.001.patch, YARN-7005.002.patch,
> YARN-7005.003.patch
>
>
> Nowadays even if there is only one pending app in a queue, the scheduling
> process will go through all queues anyway and costs most of time on sorting
> and iterating child queues in ParentQueue#assignContainersToChildQueues.
> IIUIC, queues that have no pending resource can be skipped for sorting and
> iterating process to reduce time cost, obviously for a cluster with many
> queues. Please feel free to correct me if I ignore something else. Thanks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]