> On April 24, 2016, 3:48 p.m., Bill Farner wrote: > > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java, line 51 > > <https://reviews.apache.org/r/46603/diff/2/?file=1358596#file1358596line51> > > > > Does this default value effect the same behavior as before the patch? > > Stephan Erb wrote: > Using a default of `0` is indeed a behaviour change. I am happy to > discuss if we want this change or not. > > With a timeout of `5` secs (this was the former hardcoded default): > > * When launching a task, Mesos will only re-offer the unused resources in > the offer after 5 seconds. > * When declining offers in order to merge two offers into one, Mesos will > only re-offer resources of this slave after 5s. > > With timeout of `0` secs: > > * The resources can be returned instantly within the next offer-cycle of > the Mesos allocator. > > We tend to have the problem that a timeout of 5 breaks the maintenance > feature for us. We regularly schedule jobs with #instances > #nodes in the > cluster. In this case, all available offers are quickly depleted and Aurora > begins to schedule onto nodes which were supposed to be put into maintenance > mode. Only after the timeout of 5 seonds has passed, Mesos will re-offer > resources to Aurora. I believe we might not be the only one with this problem > and therefore think 0 is a good default.
It would be great to reach out to Mesos folks to better understand the reasons behind chosing a 5 second default timeout. Last I checked, lower values _may_ result in an increased load on Mesos master. If that proves to be true I'd prefer holding on to the current behavior as a safer bet. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46603/#review130306 ----------------------------------------------------------- On April 23, 2016, 4:35 p.m., Stephan Erb wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/46603/ > ----------------------------------------------------------- > > (Updated April 23, 2016, 4:35 p.m.) > > > Review request for Aurora, Maxim Khutornenko and Bill Farner. > > > Bugs: AURORA-1658 > https://issues.apache.org/jira/browse/AURORA-1658 > > > Repository: aurora > > > Description > ------- > > Aurora is declining Mesos offers implicitly when launching a task and > explicitly when compacting multiple offers of a slave into a single one. > The filter duration instructs Mesos to return the declined resources to us > only after a timeout of X seconds, even if there is no other framework that > wants them. If no filter is supplied, the hardcoded default of 5 seconds > would be used. > > By making this value configurable, Aurora can be tuned for either single or > multi-framework deployment. > > > Diffs > ----- > > RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 1d725c03d16116257e1c4242ebf60f5931d4600f > src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java > d1bb8f29c9bed42c27624204b9d34ab1893468f7 > src/main/java/org/apache/aurora/scheduler/mesos/Driver.java > 013c50cf70fe45fc2a74c1ea5dccccfaba14225c > src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java > 7ff3e3e5dc70187066b914f7feb65d99f2145303 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > 452451f239a964c1b55ede3d6fbde0bd805e4b00 > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java > 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 > > src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java > a52fd4e8cd5c32d9560d4d72958a54bef820d81c > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > 76da6d80d91221336e50d596cc2f49e890451fd1 > > Diff: https://reviews.apache.org/r/46603/diff/ > > > Testing > ------- > > * ./gradlew -Pq build > * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > I have also conducted an (unscientific) benchmark in Vagrant and started a > job with 5 instances and recorded the time from `PENDING` to `RUNNING` for > the slowest ones: > > * 7s startup time for a filter duration of 0 seconds > * 29s startup time for the hardcoded former default of 5 seconds > > > Thanks, > > Stephan Erb > >