[ https://issues.apache.org/jira/browse/YARN-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904483#comment-17904483 ]
ASF GitHub Bot commented on YARN-7327: -------------------------------------- shameersss1 commented on PR #7138: URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2531512162 @brumi1024 - Thanks for looking into this. > what is the reason behind changing the default of this setting? 1. The current default scheduling mechanism is synchronous (node-heart driven) which is not efficient when there are large number of containers to be allocated. 2. It also has additional issues like scheduling won't happen if there is node-heartbeat loss due to network issue . 3. @wangdatan did an amazing job of making the async scheudling production ready : Refer https://issues.apache.org/jira/browse/YARN-7327?focusedCommentId=16205259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16205259 for benchmark details. 4. The above benchmark shows async scheudling throughput is better than sync scheduling And hence the proposal here is to change the default scheduling stratergy for capacity scheduler from synchronous to asynchronous. Already companies like Alibaba cloud use this in their production https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/yarn-schedulers @brumi1024 - Do you think is there any blocker/issue in enabling it by default ? > CapacityScheduler: Allocate containers asynchronously by default > ---------------------------------------------------------------- > > Key: YARN-7327 > URL: https://issues.apache.org/jira/browse/YARN-7327 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Craig Ingram > Assignee: Syed Shameerur Rahman > Priority: Trivial > Attachments: async-scheduling-results.md, schedule-async.png, > spark-on-yarn-schedule-async.ipynb, yarn-async-scheduling.png > > > I was recently doing some research into Spark on YARN's startup time and > observed slow, synchronous allocation of containers/executors. I am testing > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was > only allocating about 3 containers per second. Moreover when starting 3 Spark > applications at the same time with each requesting 44 containers, the first > application would get all 44 requested containers and then the next > application would start getting containers and so on. > > From looking at the code, it appears this is by design. There is an > undocumented configuration variable that will enable asynchronous allocation > of containers. I'm sure I'm missing something, but why is this not the > default? Is there a bug or race condition in this code path? I've done some > testing with it and it's been working and is significantly faster. > > Here's the config: > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > Any help understanding this would be appreciated. > > Thanks, > Craig > > If you're curious about the performance difference with this setting, here > are the results: > > The following tool was used for the benchmarks: > https://github.com/SparkTC/spark-bench > h2. async scheduler research > The goal of this test is to determine if running Spark on YARN with async > scheduling of containers reduces the amount of time required for an > application to receive all of its requested resources. This setting should > also reduce the overall runtime of short-lived applications/stages or > notebook paragraphs. This setting could prove crucial to achieving optimal > performance when sharing resources on a cluster with dynalloc enabled. > h3. Test Setup > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > between runs. > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > conf files request executors counts of: > * 2 > * 20 > * 50 > * 100 > The apps are being submitted to the default queue on each cluster which caps > at 48 cores on dynalloc and 72 cores on baremetal. The default queue was > expanded for the last two tests on baremetal so it could potentially take > advantage of all 144 cores. > h3. Test Environments > h4. dynalloc > 4 VMs in Fyre (1 master, 3 workers) > 8 CPUs/16 GB per node > model name : QEMU Virtual CPU version 2.5+ > h4. baremetal > 4 baremetal instances in Fyre (1 master, 3 workers) > 48 CPUs/128GB per node > model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > h3. Using spark-bench with timedsleep workload sync > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 23.814900 | 1.110725| > |20 | 29.770250 | 0.830528| > |50 | 44.486600 | 0.593516| > |100 | 44.337700 | 0.490139| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 14.827000 | 0.292290| > |20 | 19.613150 | 0.155421| > |50 | 30.768400 | 0.083400| > |100 | 40.931850 | 0.092160| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 14.833050 | 0.334061| > |20 | 19.575000 | 0.212836| > |50 | 30.765350 | 0.111035| > |100 | 41.763300 | 0.182700| > h3. Using spark-bench with timedsleep workload async > h4. dynalloc > || requested containers | avg | stdev|| > |2 | 22.575150 | 0.574296| > |20 | 26.904150 | 1.244602| > |50 | 44.721800 | 0.655388| > |100 | 44.570000 | 0.514540| > h5. 2nd run > || requested containers | avg | stdev|| > |2 | 22.441200 | 0.715875| > |20 | 26.683400 | 0.583762| > |50 | 44.227250 | 0.512568| > |100 | 44.238750 | 0.329712| > h4. baremetal - 2 queues splitting cluster 72 cores each > || requested containers | avg | stdev|| > |2 | 12.902350 | 0.125505| > |20 | 13.830600 | 0.169598| > |50 | 16.738050 | 0.265091| > |100 | 40.654500 | 0.111417| > h4. baremetal - 1 queue to rule them all - 144 cores > || requested containers | avg | stdev|| > |2 | 12.987150 | 0.118169| > |20 | 13.837150 | 0.145871| > |50 | 16.816300 | 0.253437| > |100 | 23.113450 | 0.320744| -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org