> On April 24, 2016, 3:48 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java, line 51
> > <https://reviews.apache.org/r/46603/diff/2/?file=1358596#file1358596line51>
> >
> >     Does this default value effect the same behavior as before the patch?
> 
> Stephan Erb wrote:
>     Using a default of `0` is indeed a behaviour change. I am happy to 
> discuss if we want this change or not. 
>     
>     With a timeout of `5` secs (this was the former hardcoded default):
>     
>     * When launching a task, Mesos will only re-offer the unused resources in 
> the offer after 5 seconds. 
>     * When declining offers in order to merge two offers into one, Mesos will 
> only re-offer resources of this slave after 5s.
>     
>     With timeout of `0` secs:
>     
>     * The resources can be returned instantly within the next offer-cycle of 
> the Mesos allocator.
>     
>     We tend to have the problem that a timeout of 5 breaks the maintenance 
> feature for us. We regularly schedule jobs with #instances > #nodes in the 
> cluster. In this case, all available offers are quickly depleted and Aurora 
> begins to schedule onto nodes which were supposed to be put into maintenance 
> mode. Only after the timeout of 5 seonds has passed, Mesos will re-offer 
> resources to Aurora. I believe we might not be the only one with this problem 
> and therefore think 0 is a good default.

It would be great to reach out to Mesos folks to better understand the reasons 
behind chosing a 5 second default timeout. Last I checked, lower values _may_ 
result in an increased load on Mesos master. If that proves to be true I'd 
prefer holding on to the current behavior as a safer bet.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130306
-----------------------------------------------------------


On April 23, 2016, 4:35 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 4:35 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and 
> explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us 
> only after a timeout of X seconds, even if there is no other framework that 
> wants them. If no filter is supplied, the hardcoded default of 5 seconds 
> would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or 
> multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java 
> d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 
> 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 
> 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   
> src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java 
> a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a 
> job with 5 instances and recorded the time from `PENDING` to `RUNNING` for 
> the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>

Reply via email to