> On May 24, 2017, 7:45 p.m., Santhosh Kumar Shanmugham wrote: > > Only down-side to this approach is that the set of offers (hence agents) > > that a task might get assigned to is effectively reduced and hence the > > probability of a task landing on the same broken host increases. Assuming a > > healthy cluster (with good distribution of slots of different sizes) this > > might problem might never surface. > > > > Ship it!
This is a great point. We don't plan to use this directly for other reasons, but I'll need to clean up the wording here. To be clear to anyone who is interested in this feature: I actually wouldn't recommend using any type of stable sort of offers in production with the FirstFitTaskAssigner yet - as all scheduling work will be biased to that bad host as tasks repeatedly fail and Mesos reoffers them. We'd need some sort of failure accural detection mechanism in Aurora (or Mesos) to blacklist bad agents before using this confidently. Our plan at Twitter was to use this approach to make score-based scheduling more scalable. - David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59480/#review175977 ----------------------------------------------------------- On May 23, 2017, 7:41 a.m., David McLaughlin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59480/ > ----------------------------------------------------------- > > (Updated May 23, 2017, 7:41 a.m.) > > > Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > This patch enables scalable, high-performance Scheduler bin-packing using the > existing first-fit task assigner, and it can be controlled with a simple > command line argument. > > The bin-packing is only an approximation, but can lead to pretty significant > improvements in resource utilization per agent. For example, on a CPU-bound > cluster with 30k+ hosts and 135k tasks (across 1k+ jobs) - we were able to > reduce the number of hosts with tasks scheduled on them to just 90%, down > from 99.7% (as one would expect from randomization). So if you are running > Aurora on elastic computing and paying for machines by the minute/hour, then > utilizing this patch _could_ allow you to reduce your server footprint by as > much as 10%. > > The approximation is based on the simple idea that you have the best chance > of having perfect bin-packing if you put tasks in the smallest slot > available. So if you have a task needing 8 cores and you have an 8 core and > 12 core offer available - you'd always want to put the task in the 8 core > offer*. By sorting offers in OfferManager during iteration, then a first-fit > algorithm is guaranteed to match the smallest possible offer for your task > and achieves this. > > * - The correct decision of course depends on the other pending tasks and the > other resources available, and more satisfactory results may also need > preemption, etc. > > > Diffs > ----- > > RELEASE-NOTES.md 77376e438bd7af74c364dcd5d1b3e3f1ece2adbf > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > f2296a9d7a88be7e43124370edecfe64415df00f > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > 78255e6dfa31c4920afc0221ee60ec4f8c2a12c4 > src/main/java/org/apache/aurora/scheduler/offers/OfferOrder.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java > adf7f33e4a72d87c3624f84dfe4998e20dc75fdc > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java > 317a2d26d8bfa27988c60a7706b9fb3aa9b4e2a2 > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > d7addc0effb60c196cf339081ad81de541d05385 > src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java > 676d305d257585e53f0a05b359ba7eb11f1b23be > > > Diff: https://reviews.apache.org/r/59480/diff/1/ > > > Testing > ------- > > This has been scale-tested with production-like workloads and performs well, > adding only a few extra seconds total in TaskAssigner when applied to > thousands of tasks per minute. > > There is an overhead when scheduling tasks that have large resource > requirements - as the task assigner will first need to skip offer all the > offers with low resources. In a packed cluster, this is where the extra > seconds are spent. This could be reduced by just jumping over all the offers > we know to be too small, but that decision has to map to the OfferOrder > (which adds complexity). That can be addressed in a follow-up review if > needed. > > > Thanks, > > David McLaughlin > >
