I created a JIRA ticket for this. Just to keep the discussion in one place. Here's the link. https://issues.apache.org/jira/browse/YARN-7327.
Thanks! On Fri, Oct 13, 2017 at 5:14 PM Arun Suresh <[email protected]> wrote: > Hello Craig > > Thanks for trying this. Asynchronous scheduling (in the up-until-now > released 2.x branches of YARN) is fairly experimental, and it does lead to > some unnecessary locking and race conditions. > Wangda has re-factored most of the asynchronous scheduling code paths and > it should be available in 2.9.0 and you can give it a shot in 3.0.0-beta1 > as well. > > The default scheduling mode (what you refer to as synchronous scheduling) > is actually Node heartbeat triggered scheduling. There are certain cases > where I guess the default scheduling might still be more apt. For eg, if > most of your requests have stricter Data locality requirements. Also, in a > slightly pegged cluster, I suspect you might see higher latencies - I have > yet to test this though. > > But in general, it is direction we are actively looking at. BTW, for > extremely short duration tasks, there is also an option to use > OPPORTUNISTIC containers (https://issues.apache.org/jira/browse/YARN-2877 > and https://issues.apache.org/jira/browse/YARN-5220) but you need to have > support in the AM for that. > > Cheers > -Arun > > On Fri, Oct 13, 2017 at 11:30 AM, Craig Ingram <[email protected]> > wrote: > > > I was recently doing some research into Spark on YARN's startup time and > > observed slow, synchronous allocation of containers/executors. I am > testing > > on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN > > was only allocating about 3 containers per second. Moreover when > starting 3 > > Spark applications at the same time with each requesting 44 containers, > the > > first application would get all 44 requested containers and then the next > > application would start getting containers and so on. > > > > From looking at the code, it appears this is by design. There is an > > undocumented configuration variable that will enable asynchronous > > allocation of containers. I'm sure I'm missing something, but why is this > > not the default? Is there a bug or race condition in this code path? I've > > done some testing with it and it's been working and is significantly > > faster. > > > > Here's the config: > > `yarn.scheduler.capacity.schedule-asynchronously.enable` > > > > Any help understanding this would be appreciated. > > > > Thanks, > > Craig > > > > > > If you're curious about the performance difference with this setting, > here > > are the results: > > > > The following tool was used for the benchmarks: > > https://github.com/SparkTC/spark-bench > > > > # async scheduler research > > The goal of this test is to determine if running Spark on YARN with async > > scheduling of containers reduces the amount of time required for an > > application to receive all of its requested resources. This setting > should > > also reduce the overall runtime of short-lived applications/stages or > > notebook paragraphs. This setting could prove crucial to achieving > optimal > > performance when sharing resources on a cluster with dynalloc enabled. > > ## Test Setup > > Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) > > between runs. > > `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false` > > > > conf files request executors counts of: > > * 2 > > * 20 > > * 50 > > * 100 > > The apps are being submitted to the default queue on each cluster which > > caps at 48 cores on dynalloc and 72 cores on baremetal. The default queue > > was expanded for the last two tests on baremetal so it could potentially > > take advantage of all 144 cores. > > ## Test Environments > > ### dynalloc > > 4 VMs in Fyre (1 master, 3 workers) > > 8 CPUs/16 GB per node > > model name : QEMU Virtual CPU version 2.5+ > > ### baremetal > > 4 baremetal instances in Fyre (1 master, 3 workers) > > 48 CPUs/128GB per node > > model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > > > > ## Using spark-bench with timedsleep workload sync > > ### dynalloc > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 23.814900 | 1.110725 > > spark-on-yarn-schedule-async1.time | 29.770250 | 0.830528 > > spark-on-yarn-schedule-async2.time | 44.486600 | 0.593516 > > spark-on-yarn-schedule-async3.time | 44.337700 | 0.490139 > > ### baremetal - 2 queues splitting cluster 72 cores each > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 14.827000 | 0.292290 > > spark-on-yarn-schedule-async1.time | 19.613150 | 0.155421 > > spark-on-yarn-schedule-async2.time | 30.768400 | 0.083400 > > spark-on-yarn-schedule-async3.time | 40.931850 | 0.092160 > > ### baremetal - 1 queue to rule them all - 144 cores > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 14.833050 | 0.334061 > > spark-on-yarn-schedule-async1.time | 19.575000 | 0.212836 > > spark-on-yarn-schedule-async2.time | 30.765350 | 0.111035 > > spark-on-yarn-schedule-async3.time | 41.763300 | 0.182700 > > > > ## Using spark-bench with timedsleep workload async > > ### dynalloc > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 22.575150 | 0.574296 > > spark-on-yarn-schedule-async1.time | 26.904150 | 1.244602 > > spark-on-yarn-schedule-async2.time | 44.721800 | 0.655388 > > spark-on-yarn-schedule-async3.time | 44.570000 | 0.514540 > > #### 2nd run > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 22.441200 | 0.715875 > > spark-on-yarn-schedule-async1.time | 26.683400 | 0.583762 > > spark-on-yarn-schedule-async2.time | 44.227250 | 0.512568 > > spark-on-yarn-schedule-async3.time | 44.238750 | 0.329712 > > ### baremetal - 2 queues splitting cluster 72 cores each > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 12.902350 | 0.125505 > > spark-on-yarn-schedule-async1.time | 13.830600 | 0.169598 > > spark-on-yarn-schedule-async2.time | 16.738050 | 0.265091 > > spark-on-yarn-schedule-async3.time | 40.654500 | 0.111417 > > ### baremetal - 1 queue to rule them all - 144 cores > > conf | avg | stddev > > --- | --- | --- > > spark-on-yarn-schedule-async0.time | 12.987150 | 0.118169 > > spark-on-yarn-schedule-async1.time | 13.837150 | 0.145871 > > spark-on-yarn-schedule-async2.time | 16.816300 | 0.253437 > > spark-on-yarn-schedule-async3.time | 23.113450 | 0.320744 > > >
