-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/
-----------------------------------------------------------

Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan 
Erb, and Bill Farner.


Repository: aurora


Description
-------

Using the new `hold_offers_forever` option, it is possible for the 
`staticallyBannedOffers` to grow very large in size as we never release offers.

As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
`staticallyBannedOffers` into a LRU cache which expires entries after 
`min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
`maxOfferHoldTime`), while also taking an option for a maximum size for the 
cache. I believe that this approach has a couple of benefits:

1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
Entries will no longer be removed when the offer is used, but they will be 
removed within `maxOfferHoldTime`. This means cluster operators will not have 
to think about the new `offer_static_ban_cache_max_size` if they aren't 
affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers 
indefinitely can cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.


Diffs
-----

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
5a9099bf9dd292249d72bc3a7604fbb3394f30ea 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
7011a4cc9eea827cdd54698aaed1a653774bce7f 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
e6b2c55e4f33f9a603157236766425edcaff10e7 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
5b502442163581daa4d7954b09c00bdc3680a726 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 


Diff: https://reviews.apache.org/r/63199/diff/1/


Testing
-------

Unit tests pass.
Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
memory leak fixed with correct options and b) lowered assignment time for 
quickly recurring crons and rescheduled jobs.


Thanks,

Jordan Ly

Reply via email to