----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62956/#review187939 -----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java Lines 67-68 (patched) <https://reviews.apache.org/r/62956/#comment265006> As far as I know this will filter this agent entirely for 30 days. This comes pretty close to leaking agents. https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L1207-L1209 This implies the timeout would need to be significantly smaller (e.g ~3 minutes) and configurable for operators. At that point, I am no longer sure the optimization would help at Twitter-scale clusters. src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java Lines 220-224 (patched) <https://reviews.apache.org/r/62956/#comment265005> This won't work for us. We are using both non-revocable and revocable (CPU & RAM) resources. it is crucial for us that we can still use revocable resources on an agent even if the non-revocable resources are maxed out. The same applies vice versa. This pseudo code should solve it: ``` bool lacksUsefulResources(offer): no_revocable = revocable_mem <= mem_threshold || revocable_cpu <= cpu_threshold no_non_revocabe = mem <= mem_threshold || cpu <= cpu_threshold return no_revocable and no_non_revocable ``` Would that still work for you? (As a minor improvement of the heuristic we could use the minimal executor resources as thresholds rather than 0) - Stephan Erb On Oct. 13, 2017, 1:18 a.m., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62956/ > ----------------------------------------------------------- > > (Updated Oct. 13, 2017, 1:18 a.m.) > > > Review request for Aurora, David McLaughlin and Jordan Ly. > > > Repository: aurora > > > Description > ------- > > There's no reason for us to evaluate offers with no CPUs or memory, so reject > them early in the offer lifecycle. > > This is an incremental performance optimization, but it may net significant > improvements based on observations in some very large clusters. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/http/Utilization.java > 3c77e2983ce00f897f3d5ed106b779cd7f7f0940 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > e8334310a2a46a0ccb09ee6e4122c515892d3996 > > src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java > 1b1239753f40d7d46d91724def6c25037eb79f1c > src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java > d5db81b88a0369d0b26c8fbf70efab3886ad7695 > src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java > b98aaaf48ae60afef19a368ee96abc897300f8fa > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > 2cfdc090ff75a63111ae146c9fe7b3542e7ac83f > src/test/java/org/apache/aurora/scheduler/offers/Offers.java > 129b4437315c6ad4ea47ca75d4ae6e28cadd7911 > src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java > 765a527acb96997989c920be8b69dfa1113dc302 > > > Diff: https://reviews.apache.org/r/62956/diff/2/ > > > Testing > ------- > > > Thanks, > > Bill Farner > >
