Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189674
---


Ship it!




Master (87eb891) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Oct. 30, 2017, 10:50 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 30, 2017, 10:50 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> e0ec793cad05674fb4b65246a6d153521b28b914 
>   
> src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 244422c6b3ac6a2f7b4690cdc0f3440170b2567f 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 3d38a5929a0255a980db30eeca0f059a2029f321 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/6/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/
---

(Updated Oct. 30, 2017, 10:50 p.m.)


Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan 
Erb, and Bill Farner.


Changes
---

Create `CacheBuilder` in `OfferSettings` instead, pass to `OfferManager`


Repository: aurora


Description
---

Using the new `hold_offers_forever` option, it is possible for the 
`staticallyBannedOffers` to grow very large in size as we never release offers.

As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
`staticallyBannedOffers` into a LRU cache which expires entries after 
`min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
`maxOfferHoldTime`), while also taking an option for a maximum size for the 
cache. I believe that this approach has a couple of benefits:

1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
Entries will no longer be removed when the offer is used, but they will be 
removed within `maxOfferHoldTime`. This means cluster operators will not have 
to think about the new `offer_static_ban_cache_max_size` if they aren't 
affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers 
indefinitely can cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
e0ec793cad05674fb4b65246a6d153521b28b914 
  
src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
7011a4cc9eea827cdd54698aaed1a653774bce7f 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
e6b2c55e4f33f9a603157236766425edcaff10e7 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
244422c6b3ac6a2f7b4690cdc0f3440170b2567f 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
3d38a5929a0255a980db30eeca0f059a2029f321 


Diff: https://reviews.apache.org/r/63199/diff/6/

Changes: https://reviews.apache.org/r/63199/diff/5-6/


Testing
---

Unit tests pass.
Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
memory leak fixed with correct options and b) lowered assignment time for 
quickly recurring crons and rescheduled jobs.


Thanks,

Jordan Ly



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189648
---


Ship it!




LGTM, with the switch to `CacheBuilder` in `OfferSettings` as the finishing 
touch.  Nice patch!

- Bill Farner


On Oct. 30, 2017, 11:14 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 30, 2017, 11:14 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> e0ec793cad05674fb4b65246a6d153521b28b914 
>   
> src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 244422c6b3ac6a2f7b4690cdc0f3440170b2567f 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 3d38a5929a0255a980db30eeca0f059a2029f321 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/5/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Bill Farner


> On Oct. 27, 2017, 3:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
> > Lines 42-43 (patched)
> > 
> >
> > How about a `CacheBuilderSpec` to bundle these?
> 
> Jordan Ly wrote:
> I think that using a `CacheBuilderSpec` would cause some odd string 
> manipulation from our argument names to the ones `CacheBuilderSpec` uses.
> 
> I could just build the `CacheBuilder` in `OfferModule`/`OfferSettings` 
> and pass that into `OfferManager` instead. Which do you think would be 
> cleaner?

> I think that using a CacheBuilderSpec would cause some odd string 
> manipulation from our argument names to the ones CacheBuilderSpec uses

Please forgive me, i neglected to look closely enough at CacheBuilderSpec.  
Didn't realize it could only be created from a String.

> I could just build the CacheBuilder in OfferModule/OfferSettings and pass 
> that into OfferManager instead. Which do you think would be cleaner?

Let's go that route, that's what i should have suggested :-)


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189488
---


On Oct. 30, 2017, 11:14 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 30, 2017, 11:14 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> e0ec793cad05674fb4b65246a6d153521b28b914 
>   
> src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 244422c6b3ac6a2f7b4690cdc0f3440170b2567f 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 3d38a5929a0255a980db30eeca0f059a2029f321 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/5/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189642
---


Ship it!




Master (448e6d4) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Oct. 30, 2017, 6:14 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 30, 2017, 6:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> e0ec793cad05674fb4b65246a6d153521b28b914 
>   
> src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 244422c6b3ac6a2f7b4690cdc0f3440170b2567f 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 3d38a5929a0255a980db30eeca0f059a2029f321 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/5/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Jordan Ly


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 321 (patched)
> > 
> >
> > `expireAfterWrite` doesn't result in LRU.  I think you mean 
> > `expireAfterAccess`.

LRU is maintained if the operator specifies a maximum size 
(https://github.com/google/guava/wiki/CachesExplained#size-based-eviction).

I use `expireAfterWrite` because offers will not be valid after 
`minOfferHoldTime`. I believe both strategies can exist simultaneously.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
> > Lines 42-43 (patched)
> > 
> >
> > How about a `CacheBuilderSpec` to bundle these?

I think that using a `CacheBuilderSpec` would cause some odd string 
manipulation from our argument names to the ones `CacheBuilderSpec` uses.

I could just build the `CacheBuilder` in `OfferModule`/`OfferSettings` and pass 
that into `OfferManager` instead. Which do you think would be cleaner?


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
> > Lines 46-50 (patched)
> > 
> >
> > We've developed a 
> > [stutter](https://michaelwhatcott.com/go-code-that-stutters/) over time in 
> > this class.  Can you do some cleanup and s/offer// on parameters, fields, 
> > and methods?

Done.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Lines 104 (patched)
> > 
> >
> > `0` is a valid size for guava's `CacheBuilder#maximumSize`:
> > > When size is zero, elements will be evicted immediately after being 
> > loaded into the cache. This can be useful in testing, or to disable caching 
> > temporarily without a code change.
> > 
> > Let's allow it, and introduce a `NotNegativeNumber` counterpart to 
> > `PositiveNumber`.

Added.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Lines 109 (patched)
> > 
> >
> > s/Long/long/

Done.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Line 178 (original), 188-189 (patched)
> > 
> >
> > Is this needed only because of the default `offerStaticBanCacheMaxSize 
> > = Long.MAX_VALUE`?  The double eviction strategy intuitively seems 
> > unnecessary.  I suggest a finite default for `offerStaticBanCacheMaxSize` 
> > (say, 1k) and no time expiry.

Spoke offline, but added a comment to explain the reasoning behind the dual 
eviction strategy.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java
> > Lines 247-249 (original), 253-255 (patched)
> > 
> >
> > IMHO this test case is no longer interesting now that we have to punch 
> > through with `cleanupStaticBans()`.  The test now reads as "test that 
> > clearing a cache clears the cache".  I suggest removing the test case.

`cleanupWithStaticBans()` only ensures that the `size()` method returns the 
correct value by not counting evicted entries (i.e. the entry we just expired 
in the test will not be counted). I think this test is still useful as it 
ensures the eviction on expiration strategy works as intended. I added a 
comment to `cleanupWithStaticBans()` for clarity.


- Jordan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189488
---


On Oct. 30, 2017, 6:14 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 30, 2017, 6:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which 

Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-30 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/
---

(Updated Oct. 30, 2017, 6:14 p.m.)


Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan 
Erb, and Bill Farner.


Changes
---

Address feedback.


Repository: aurora


Description
---

Using the new `hold_offers_forever` option, it is possible for the 
`staticallyBannedOffers` to grow very large in size as we never release offers.

As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
`staticallyBannedOffers` into a LRU cache which expires entries after 
`min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
`maxOfferHoldTime`), while also taking an option for a maximum size for the 
cache. I believe that this approach has a couple of benefits:

1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
Entries will no longer be removed when the offer is used, but they will be 
removed within `maxOfferHoldTime`. This means cluster operators will not have 
to think about the new `offer_static_ban_cache_max_size` if they aren't 
affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers 
indefinitely can cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
e0ec793cad05674fb4b65246a6d153521b28b914 
  
src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
7011a4cc9eea827cdd54698aaed1a653774bce7f 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
e6b2c55e4f33f9a603157236766425edcaff10e7 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
244422c6b3ac6a2f7b4690cdc0f3440170b2567f 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
3d38a5929a0255a980db30eeca0f059a2029f321 


Diff: https://reviews.apache.org/r/63199/diff/5/

Changes: https://reviews.apache.org/r/63199/diff/4-5/


Testing
---

Unit tests pass.
Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
memory leak fixed with correct options and b) lowered assignment time for 
quickly recurring crons and rescheduled jobs.


Thanks,

Jordan Ly



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-27 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189488
---




src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
Lines 321 (patched)


`expireAfterWrite` doesn't result in LRU.  I think you mean 
`expireAfterAccess`.



src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
Lines 42-43 (patched)


How about a `CacheBuilderSpec` to bundle these?



src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
Lines 46-50 (patched)


We've developed a 
[stutter](https://michaelwhatcott.com/go-code-that-stutters/) over time in this 
class.  Can you do some cleanup and s/offer// on parameters, fields, and 
methods?



src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
Lines 104 (patched)


`0` is a valid size for guava's `CacheBuilder#maximumSize`:
> When size is zero, elements will be evicted immediately after being 
loaded into the cache. This can be useful in testing, or to disable caching 
temporarily without a code change.

Let's allow it, and introduce a `NotNegativeNumber` counterpart to 
`PositiveNumber`.



src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
Lines 109 (patched)


s/Long/long/



src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
Line 178 (original), 188-189 (patched)


Is this needed only because of the default `offerStaticBanCacheMaxSize = 
Long.MAX_VALUE`?  The double eviction strategy intuitively seems unnecessary.  
I suggest a finite default for `offerStaticBanCacheMaxSize` (say, 1k) and no 
time expiry.



src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java
Lines 247-249 (original), 253-255 (patched)


IMHO this test case is no longer interesting now that we have to punch 
through with `cleanupStaticBans()`.  The test now reads as "test that clearing 
a cache clears the cache".  I suggest removing the test case.


- Bill Farner


On Oct. 20, 2017, 10:53 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 20, 2017, 10:53 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 5a9099bf9dd292249d72bc3a7604fbb3394f30ea 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5b502442163581daa4d7954b09c00bdc3680a726 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/4/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with 

Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-21 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review188890
---


Ship it!




Master (9825e05) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Oct. 21, 2017, 5:53 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 21, 2017, 5:53 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 5a9099bf9dd292249d72bc3a7604fbb3394f30ea 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5b502442163581daa4d7954b09c00bdc3680a726 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/4/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-21 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review19
---



Master (9825e05) is red with this patch.
  ./build-support/jenkins/build.sh

 [183] ./src/main/sass/app.scss 1.2 kB {0} [built]
 [184] ./src/main/resources/source-sans-pro.css 1.05 kB {0} [built]
 [214] ./src/main/js/index.js 3.18 kB {0} [built]
 [249] ./~/react-router-dom/es/Redirect.js 137 bytes {0} [built]
+ 258 hidden modules
:processResources
:classes
:jar
:startScripts
:distTar
:distZip
:assemble
:compileTestJavaNote: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processTestResources
:testClasses
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources NO-SOURCE
:jmhClasses
:checkstyleJmh
:checkstyleMain[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java:106:96:
 '+' should be on a new line. [OperatorWrap]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java:107:90:
 '+' should be on a new line. [OperatorWrap]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleMain'.
> Checkstyle rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 2m 36s
35 actionable tasks: 29 executed, 6 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Oct. 20, 2017, 10:53 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 20, 2017, 10:53 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 5a9099bf9dd292249d72bc3a7604fbb3394f30ea 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5b502442163581daa4d7954b09c00bdc3680a726 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/2/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-21 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review17
---



Master (9825e05) is red with this patch.
  ./build-support/jenkins/build.sh

 [182] ./~/bootstrap/dist/css/bootstrap.css 984 bytes {0} [built]
 [183] ./src/main/sass/app.scss 1.2 kB {0} [built]
 [184] ./src/main/resources/source-sans-pro.css 1.05 kB {0} [built]
 [214] ./src/main/js/index.js 3.18 kB {0} [built]
 [249] ./~/react-router-dom/es/Redirect.js 137 bytes {0} [built]
+ 258 hidden modules
:processResources
:classes
:jar
:startScripts
:distTar
:distZip
:assemble
:compileTestJavaNote: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processTestResources
:testClasses
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources NO-SOURCE
:jmhClasses
:checkstyleJmh[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java:22:8:
 Unused import - com.google.common.base.Optional. [UnusedImports]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java:23:8:
 Unused import - com.google.common.base.Ticker. [UnusedImports]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleJmh'.
> Checkstyle rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/jmh.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 2m 35s
34 actionable tasks: 28 executed, 6 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Oct. 21, 2017, 5:53 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> ---
> 
> (Updated Oct. 21, 2017, 5:53 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Using the new `hold_offers_forever` option, it is possible for the 
> `staticallyBannedOffers` to grow very large in size as we never release 
> offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
> `staticallyBannedOffers` into a LRU cache which expires entries after 
> `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
> `maxOfferHoldTime`), while also taking an option for a maximum size for the 
> cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
> Entries will no longer be removed when the offer is used, but they will be 
> removed within `maxOfferHoldTime`. This means cluster operators will not have 
> to think about the new `offer_static_ban_cache_max_size` if they aren't 
> affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers 
> indefinitely can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job 
> updates.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 5a9099bf9dd292249d72bc3a7604fbb3394f30ea 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 7011a4cc9eea827cdd54698aaed1a653774bce7f 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e6b2c55e4f33f9a603157236766425edcaff10e7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5b502442163581daa4d7954b09c00bdc3680a726 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 
> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
> memory leak fixed with correct options and b) lowered assignment time for 
> quickly recurring crons and rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Review Request 63199: Refactor staticallyBannedOffers into a LRU cache

2017-10-20 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/
---

Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan 
Erb, and Bill Farner.


Repository: aurora


Description
---

Using the new `hold_offers_forever` option, it is possible for the 
`staticallyBannedOffers` to grow very large in size as we never release offers.

As an alternative to https://reviews.apache.org/r/63121/, I propose changing 
`staticallyBannedOffers` into a LRU cache which expires entries after 
`min_offer_hold_time` + `offer_hold_jitter_window` (referred to as 
`maxOfferHoldTime`), while also taking an option for a maximum size for the 
cache. I believe that this approach has a couple of benefits:

1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. 
Entries will no longer be removed when the offer is used, but they will be 
removed within `maxOfferHoldTime`. This means cluster operators will not have 
to think about the new `offer_static_ban_cache_max_size` if they aren't 
affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers 
indefinitely can cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.


Diffs
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
5a9099bf9dd292249d72bc3a7604fbb3394f30ea 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
7011a4cc9eea827cdd54698aaed1a653774bce7f 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
e060f2073dce4d2486d1ee2bfd873fe75167c6d0 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
e6b2c55e4f33f9a603157236766425edcaff10e7 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
5b502442163581daa4d7954b09c00bdc3680a726 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 


Diff: https://reviews.apache.org/r/63199/diff/1/


Testing
---

Unit tests pass.
Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` 
memory leak fixed with correct options and b) lowered assignment time for 
quickly recurring crons and rescheduled jobs.


Thanks,

Jordan Ly