Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-05 Thread Maxim Khutornenko


 On Sept. 4, 2014, 11:41 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java, line 175
  https://reviews.apache.org/r/24915/diff/1/?file=665740#file665740line175
 
  Shouldn't this be == EXECUTOR_GC_INTERVAL to minimize the side-effects 
  of a failover?  As-is, you'll have a 15 minute period with 4x elevated GC 
  runs.
  
  If so, this arg could collapse, and the new GcExecutorSettings method 
  can go away.

I was split originally on reusing the same interval but decided in favor of a 
shorter interval to catch up a bit faster. Fine either way.


 On Sept. 4, 2014, 11:41 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java, 
  line 145
  https://reviews.apache.org/r/24915/diff/1/?file=665741#file665741line145
 
  Please wrap this with Collections.synchronizedMap to preserve thread 
  safety.

Done.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52376
---


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-05 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/
---

(Updated Sept. 6, 2014, 1:18 a.m.)


Review request for Aurora and Bill Farner.


Changes
---

CR comments.


Bugs: AURORA-608
https://issues.apache.org/jira/browse/AURORA-608


Repository: aurora


Description
---

Adding initial GC task delay on scheduler restart.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
65f404915bc60ffe11a7a57d9861ac5b37fa646a 
  src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
04137072891b2a1f0ad663182629dd469b09324f 

Diff: https://reviews.apache.org/r/24915/diff/


Testing
---

gradle -Pq build


Thanks,

Maxim Khutornenko



Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-04 Thread Bill Farner


 On Sept. 2, 2014, 4:43 p.m., Bill Farner wrote:
  I'm concerned that the problem we're solving is underspecified.  An 
  immediate issue i have with this patch is that it introduces a memory leak 
  (in practice, this is hidden with default settings due to failover failover 
  hides this).
  
  I'll reply on the ticket to try to reach consensus on what the problem is 
  and what it means to fix it.
 
 Maxim Khutornenko wrote:
 I don't think we are trying to fix a Mesos problem here. Regardless of 
 the underlying Mesos resolution (MESOS-1646), I do think Aurora should be a 
 good Mesos citizen here and avoid abnormal offer activity spikes driven by 
 failovers. 
 
 In order for that memory leak to expose itself in any reasonable manner, 
 there must be an intensive and constant churn of hosts in the cluster. I 
 don't think it's a likely scenario but if you are concerned it should be easy 
 to converge on a hybrid solution with expiring cache where the 
 EXECUTOR_GC_RESTART_INTERVAL kicks in any time an item is not found in the 
 cache.

 I do think Aurora should be a good Mesos citizen here and avoid abnormal 
 offer activity spikes driven by failovers

Can you offer more detail here - why is this bad behavior?


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52037
---


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-04 Thread Maxim Khutornenko


 On Sept. 2, 2014, 4:43 p.m., Bill Farner wrote:
  I'm concerned that the problem we're solving is underspecified.  An 
  immediate issue i have with this patch is that it introduces a memory leak 
  (in practice, this is hidden with default settings due to failover failover 
  hides this).
  
  I'll reply on the ticket to try to reach consensus on what the problem is 
  and what it means to fix it.
 
 Maxim Khutornenko wrote:
 I don't think we are trying to fix a Mesos problem here. Regardless of 
 the underlying Mesos resolution (MESOS-1646), I do think Aurora should be a 
 good Mesos citizen here and avoid abnormal offer activity spikes driven by 
 failovers. 
 
 In order for that memory leak to expose itself in any reasonable manner, 
 there must be an intensive and constant churn of hosts in the cluster. I 
 don't think it's a likely scenario but if you are concerned it should be easy 
 to converge on a hybrid solution with expiring cache where the 
 EXECUTOR_GC_RESTART_INTERVAL kicks in any time an item is not found in the 
 cache.
 
 Bill Farner wrote:
  I do think Aurora should be a good Mesos citizen here and avoid 
 abnormal offer activity spikes driven by failovers
 
 Can you offer more detail here - why is this bad behavior?

We spread GC activity randomly to avoid hourly spikes 
(https://reviews.apache.org/r/16247/). Failovers, however, break that pattern 
and manifest in unusual GC activity spikes. All Mesos matters aside, I don't 
think scheduler interruptions should result in what appears externally as a 
catch-up behavior.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52037
---


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-04 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52328
---


Per offline discussion, i've realized my memory leak concern is ~moot since we 
already have the unbounded HostAttributes store.  I liked your suggestion of 
augmenting the way this soft state is used:

- use a Map, no expiry
- when receiving an offer, don't kick off a GC task if a map entry is not 
present, store a future timestamp after which we should launch a GC task

- Bill Farner


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-04 Thread Maxim Khutornenko


 On Sept. 4, 2014, 5:44 p.m., Bill Farner wrote:
  Per offline discussion, i've realized my memory leak concern is ~moot since 
  we already have the unbounded HostAttributes store.  I liked your 
  suggestion of augmenting the way this soft state is used:
  
  - use a Map, no expiry
  - when receiving an offer, don't kick off a GC task if a map entry is not 
  present, store a future timestamp after which we should launch a GC task

Thanks. Just to make sure we are on the same page, the bullets above describe 
exactly how it's implemented by the current RB.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52328
---


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-04 Thread Bill Farner


 On Sept. 4, 2014, 5:44 p.m., Bill Farner wrote:
  Per offline discussion, i've realized my memory leak concern is ~moot since 
  we already have the unbounded HostAttributes store.  I liked your 
  suggestion of augmenting the way this soft state is used:
  
  - use a Map, no expiry
  - when receiving an offer, don't kick off a GC task if a map entry is not 
  present, store a future timestamp after which we should launch a GC task
 
 Maxim Khutornenko wrote:
 Thanks. Just to make sure we are on the same page, the bullets above 
 describe exactly how it's implemented by the current RB.

Hah, i was indeed on a different page - taking another look.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52328
---


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-04 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52376
---

Ship it!


LGTM overall.


src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java
https://reviews.apache.org/r/24915/#comment91149

Shouldn't this be == EXECUTOR_GC_INTERVAL to minimize the side-effects of a 
failover?  As-is, you'll have a 15 minute period with 4x elevated GC runs.

If so, this arg could collapse, and the new GcExecutorSettings method can 
go away.



src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
https://reviews.apache.org/r/24915/#comment91148

Please wrap this with Collections.synchronizedMap to preserve thread safety.


- Bill Farner


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-03 Thread Maxim Khutornenko


 On Sept. 2, 2014, 4:43 p.m., Bill Farner wrote:
  I'm concerned that the problem we're solving is underspecified.  An 
  immediate issue i have with this patch is that it introduces a memory leak 
  (in practice, this is hidden with default settings due to failover failover 
  hides this).
  
  I'll reply on the ticket to try to reach consensus on what the problem is 
  and what it means to fix it.

I don't think we are trying to fix a Mesos problem here. Regardless of the 
underlying Mesos resolution (MESOS-1646), I do think Aurora should be a good 
Mesos citizen here and avoid abnormal offer activity spikes driven by 
failovers. 

In order for that memory leak to expose itself in any reasonable manner, there 
must be an intensive and constant churn of hosts in the cluster. I don't think 
it's a likely scenario but if you are concerned it should be easy to converge 
on a hybrid solution with expiring cache where the EXECUTOR_GC_RESTART_INTERVAL 
kicks in any time an item is not found in the cache.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52037
---


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-09-02 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/#review52037
---


I'm concerned that the problem we're solving is underspecified.  An immediate 
issue i have with this patch is that it introduces a memory leak (in practice, 
this is hidden with default settings due to failover failover hides this).

I'll reply on the ticket to try to reach consensus on what the problem is and 
what it means to fix it.

- Bill Farner


On Aug. 20, 2014, 11:35 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24915/
 ---
 
 (Updated Aug. 20, 2014, 11:35 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-608
 https://issues.apache.org/jira/browse/AURORA-608
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding initial GC task delay on scheduler restart.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
 5a38a1f69ac5dbe68af3bfe175899ddee392880b 
   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
 65f404915bc60ffe11a7a57d9861ac5b37fa646a 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 04137072891b2a1f0ad663182629dd469b09324f 
 
 Diff: https://reviews.apache.org/r/24915/diff/
 
 
 Testing
 ---
 
 gradle -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Review Request 24915: Adding initial GC task delay on scheduler restart.

2014-08-20 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24915/
---

Review request for Aurora and Bill Farner.


Bugs: AURORA-608
https://issues.apache.org/jira/browse/AURORA-608


Repository: aurora


Description
---

Adding initial GC task delay on scheduler restart.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java 
5a38a1f69ac5dbe68af3bfe175899ddee392880b 
  src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 
65f404915bc60ffe11a7a57d9861ac5b37fa646a 
  src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
04137072891b2a1f0ad663182629dd469b09324f 

Diff: https://reviews.apache.org/r/24915/diff/


Testing
---

gradle -Pq build


Thanks,

Maxim Khutornenko