Re: Review Request 62956: Immediately reject offers lacking necessary resources

Bill Farner Wed, 18 Oct 2017 12:49:59 -0700


> On Oct. 13, 2017, 1:12 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 67-68 (patched)
> > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line67>
> >
> >     As far as I know this will filter this agent entirely for 30 days. This 
> > comes pretty close to leaking agents. 
> > https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L1207-L1209
> >     
> >     This implies the timeout would need to be significantly smaller (e.g ~3 
> > minutes) and configurable for operators. At that point, I am no longer sure 
> > the optimization would help at Twitter-scale clusters.


> this will filter this agent entirely for 30 days

Unfortunately that log statement lies!  The agent is not filtered, but the 
_resources_ are filtered for future consideration unless they increase.


> On Oct. 13, 2017, 1:12 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 220-224 (patched)
> > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line220>
> >
> >     This won't work for us.
> >     
> >     We are using both non-revocable and revocable (CPU & RAM) resources. it 
> > is crucial for us that we can still use revocable resources on an agent 
> > even if the non-revocable resources are maxed out. The same applies vice 
> > versa. 
> >     
> >     This pseudo code should solve it:
> >     ```
> >     bool lacksUsefulResources(offer):
> >         no_revocable = revocable_mem <= mem_threshold || revocable_cpu <= 
> > cpu_threshold
> >         no_non_revocabe = mem <= mem_threshold || cpu <= cpu_threshold
> >         
> >         return no_revocable and no_non_revocable
> >     ```
> >     
> >     Would that still work for you? 
> >     
> >     
> >     (As a minor improvement of the heuristic we could use the minimal 
> > executor resources as thresholds rather than 0)

I believe `ResourceManager.bagFromMesosResources()` does what you want - the 
resources are aggregated without regard for the revocable flag.  I explicitly 
test for this in `OfferManagerImplTest`; grep for `mixed` to find the test 
cases.  If you disagree, can you give me a test cases that points out the issue?


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62956/#review187939
-----------------------------------------------------------


On Oct. 12, 2017, 4:18 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62956/
> -----------------------------------------------------------
> 
> (Updated Oct. 12, 2017, 4:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> There's no reason for us to evaluate offers with no CPUs or memory, so reject 
> them early in the offer lifecycle.
> 
> This is an incremental performance optimization, but it may net significant 
> improvements based on observations in some very large clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/http/Utilization.java 
> 3c77e2983ce00f897f3d5ed106b779cd7f7f0940 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> e8334310a2a46a0ccb09ee6e4122c515892d3996 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java
>  1b1239753f40d7d46d91724def6c25037eb79f1c 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java 
> d5db81b88a0369d0b26c8fbf70efab3886ad7695 
>   src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java 
> b98aaaf48ae60afef19a368ee96abc897300f8fa 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 2cfdc090ff75a63111ae146c9fe7b3542e7ac83f 
>   src/test/java/org/apache/aurora/scheduler/offers/Offers.java 
> 129b4437315c6ad4ea47ca75d4ae6e28cadd7911 
>   src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java 
> 765a527acb96997989c920be8b69dfa1113dc302 
> 
> 
> Diff: https://reviews.apache.org/r/62956/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Bill Farner
> 
>

Re: Review Request 62956: Immediately reject offers lacking necessary resources

Reply via email to