----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63199/ -----------------------------------------------------------
Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Bill Farner. Repository: aurora Description ------- Using the new `hold_offers_forever` option, it is possible for the `staticallyBannedOffers` to grow very large in size as we never release offers. As an alternative to https://reviews.apache.org/r/63121/, I propose changing `staticallyBannedOffers` into a LRU cache which expires entries after `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as `maxOfferHoldTime`), while also taking an option for a maximum size for the cache. I believe that this approach has a couple of benefits: 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. Entries will no longer be removed when the offer is used, but they will be removed within `maxOfferHoldTime`. This means cluster operators will not have to think about the new `offer_static_ban_cache_max_size` if they aren't affected by the memory leak now. 2. Cluster operators that use Aurora as a single framework and hold offers indefinitely can cap the size of the cache to avoid the memory leak. 3. Using an LRU cache greatly benefits quickly recurring crons and job updates. Diffs ----- src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 5a9099bf9dd292249d72bc3a7604fbb3394f30ea src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 7011a4cc9eea827cdd54698aaed1a653774bce7f src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java e060f2073dce4d2486d1ee2bfd873fe75167c6d0 src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java e6b2c55e4f33f9a603157236766425edcaff10e7 src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 5b502442163581daa4d7954b09c00bdc3680a726 src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 Diff: https://reviews.apache.org/r/63199/diff/1/ Testing ------- Unit tests pass. Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` memory leak fixed with correct options and b) lowered assignment time for quickly recurring crons and rescheduled jobs. Thanks, Jordan Ly
