> On Oct. 13, 2017, 1:12 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > > Lines 67-68 (patched) > > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line67> > > > > As far as I know this will filter this agent entirely for 30 days. This > > comes pretty close to leaking agents. > > https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L1207-L1209 > > > > This implies the timeout would need to be significantly smaller (e.g ~3 > > minutes) and configurable for operators. At that point, I am no longer sure > > the optimization would help at Twitter-scale clusters.
> this will filter this agent entirely for 30 days Unfortunately that log statement lies! The agent is not filtered, but the _resources_ are filtered for future consideration unless they increase. > On Oct. 13, 2017, 1:12 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > > Lines 220-224 (patched) > > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line220> > > > > This won't work for us. > > > > We are using both non-revocable and revocable (CPU & RAM) resources. it > > is crucial for us that we can still use revocable resources on an agent > > even if the non-revocable resources are maxed out. The same applies vice > > versa. > > > > This pseudo code should solve it: > > ``` > > bool lacksUsefulResources(offer): > > no_revocable = revocable_mem <= mem_threshold || revocable_cpu <= > > cpu_threshold > > no_non_revocabe = mem <= mem_threshold || cpu <= cpu_threshold > > > > return no_revocable and no_non_revocable > > ``` > > > > Would that still work for you? > > > > > > (As a minor improvement of the heuristic we could use the minimal > > executor resources as thresholds rather than 0) I believe `ResourceManager.bagFromMesosResources()` does what you want - the resources are aggregated without regard for the revocable flag. I explicitly test for this in `OfferManagerImplTest`; grep for `mixed` to find the test cases. If you disagree, can you give me a test cases that points out the issue? - Bill ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62956/#review187939 ----------------------------------------------------------- On Oct. 12, 2017, 4:18 p.m., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62956/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2017, 4:18 p.m.) > > > Review request for Aurora, David McLaughlin and Jordan Ly. > > > Repository: aurora > > > Description > ------- > > There's no reason for us to evaluate offers with no CPUs or memory, so reject > them early in the offer lifecycle. > > This is an incremental performance optimization, but it may net significant > improvements based on observations in some very large clusters. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/http/Utilization.java > 3c77e2983ce00f897f3d5ed106b779cd7f7f0940 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > e8334310a2a46a0ccb09ee6e4122c515892d3996 > > src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java > 1b1239753f40d7d46d91724def6c25037eb79f1c > src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java > d5db81b88a0369d0b26c8fbf70efab3886ad7695 > src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java > b98aaaf48ae60afef19a368ee96abc897300f8fa > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > 2cfdc090ff75a63111ae146c9fe7b3542e7ac83f > src/test/java/org/apache/aurora/scheduler/offers/Offers.java > 129b4437315c6ad4ea47ca75d4ae6e28cadd7911 > src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java > 765a527acb96997989c920be8b69dfa1113dc302 > > > Diff: https://reviews.apache.org/r/62956/diff/2/ > > > Testing > ------- > > > Thanks, > > Bill Farner > >