Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149279
---


Ship it!




Master (f1e09a9) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:04 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 16, 2016, 9:04 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
---

(Updated Sept. 16, 2016, 9:04 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebased.


Repository: aurora


Description
---

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs (updated)
-

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149275
---


Ship it!




Ship It!

- Joshua Cohen


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread John Sirois


> On Sept. 15, 2016, 12:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?
> 
> Maxim Khutornenko wrote:
> I am not comfortable substituting a `Void` with a random `Object` just to 
> avoid analyzer error. This opens up a potential for much more confusing cases.
> 
> John Sirois wrote:
> Generally a `SideEffect` type with 1 instance would be handy for these 
> things.  Something to consider outside this review.
> 
> IE:
> ```java
> package org.apache.aurora.lang;
> 
> public enum SideEffect {
>   COMPLETE;
> }
> ```
> 
> I'm not sure if others would find `CompletableFuture` and the 
> corresponding `return SideEffect.COMPLETE` an improvement in these sorts of 
> situations.  It is explicit.
> 
> Maxim Khutornenko wrote:
> I don't think enum is useful here as it does not deliver any additional 
> signal. How about using a `NoResult` interface instance instead? Updated the 
> parent RB.

I used enum as the simplest way in java to ensure 1 instance, no extension, 
serializability etc.
The idea could be implemented differently, but the idea of naming side-effects 
is the primary thing... with the side-effect! bonus of no null rule exception 
needed.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 5:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 5:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?
> 
> Maxim Khutornenko wrote:
> I am not comfortable substituting a `Void` with a random `Object` just to 
> avoid analyzer error. This opens up a potential for much more confusing cases.
> 
> John Sirois wrote:
> Generally a `SideEffect` type with 1 instance would be handy for these 
> things.  Something to consider outside this review.
> 
> IE:
> ```java
> package org.apache.aurora.lang;
> 
> public enum SideEffect {
>   COMPLETE;
> }
> ```
> 
> I'm not sure if others would find `CompletableFuture` and the 
> corresponding `return SideEffect.COMPLETE` an improvement in these sorts of 
> situations.  It is explicit.

I don't think enum is useful here as it does not deliver any additional signal. 
How about using a `NoResult` interface instance instead? Updated the parent RB.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread John Sirois


> On Sept. 15, 2016, 12:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?
> 
> Maxim Khutornenko wrote:
> I am not comfortable substituting a `Void` with a random `Object` just to 
> avoid analyzer error. This opens up a potential for much more confusing cases.

Generally a `SideEffect` type with 1 instance would be handy for these things.  
Something to consider outside this review.

IE:
```java
package org.apache.aurora.lang;

public enum SideEffect {
  COMPLETE;
}
```

I'm not sure if others would find `CompletableFuture` and the 
corresponding `return SideEffect.COMPLETE` an improvement in these sorts of 
situations.  It is explicit.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 5:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 5:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?

I am not comfortable substituting a `Void` with a random `Object` just to avoid 
analyzer error. This opens up a potential for much more confusing cases.


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > line 78
> > 
> >
> > Just to be clear, this annotation is needed to ensure the data in 
> > `JobDataMap` is persisted properly?

Correct.


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java, line 100
> > 
> >
> > For integration tests I was under the impression we used 
> > `org.apache.aurora.scheduler.testing.FakeStatsProvider`

The `FakeStatsProvider` is only useful when we want to get a reading on updated 
metrics. Since everything else that can post stats is mocked, there is no 
reason to use a real object here.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149182
---


Ship it!




Ship It!

- Stephan Erb


On Sept. 15, 2016, 1:12 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 15, 2016, 1:12 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-15 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


Fix it, then Ship it!




LGTM, the changes are fairly straight forward. I dislike the (slight) 
complexity that comes from using the quartz `JobDataMap` but I'm unsure how 
else we can store the data required in a safe and concurrent manner.


config/findbugs/excludeFilter.xml (line 123)


If I'm reading the code correctly we need this because we have a 
`CompletableFuture` and to give it a value we use `null`.

Have you considered using `new Object()` instead so we don't have to add 
this exception to our rules?



src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java (line 
76)


Just to be clear, this annotation is needed to ensure the data in 
`JobDataMap` is persisted properly?



src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java (line 100)


For integration tests I was under the impression we used 
`org.apache.aurora.scheduler.testing.FakeStatsProvider`


- Zameer Manji


On Sept. 14, 2016, 4:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 4:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-14 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148996
---


Ship it!




Master (5069f93) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-14 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
---

(Updated Sept. 14, 2016, 11:12 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebasing and comments.


Repository: aurora


Description
---

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs (updated)
-

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-14 Thread Maxim Khutornenko


> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > lines 164-172
> > 
> >
> > I probably don't completely understand the logic, so stupid question 
> > ahead:
> > 
> > Couldn't we do that cleanup of JobDataMap in the handler where you 
> > currently do the `killFollowups.add(key)`? Having this spread out over the 
> > large `doExecute` function doesn't help understandability very much.

Unfortunately, we can't. That was my first thought but the context and its 
detail map are reconstructed for every trigger and are gone the moment a 
trigger is released.


> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > line 174
> > 
> >
> > I will probably find this message confusing if I see it in the logs, in 
> > particular if I have configured the job with KILL_EXISTING.

Any suggestions? There are lots of moving parts addressed in this message, so I 
am not sure how to better explain it without describing how quartz works :)


> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > line 274
> > 
> >
> > The term "followup" is rather generic and difficuelt to understand if 
> > seen in isolation in the log. You should at least mention it is about 
> > killing a cron job.

Good point. Rephrased.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148933
---


On Sept. 14, 2016, 12:23 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 12:23 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> 

Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-14 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148933
---




src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java (lines 
148 - 156)


I probably don't completely understand the logic, so stupid question ahead:

Couldn't we do that cleanup of JobDataMap in the handler where you 
currently do the `killFollowups.add(key)`? Having this spread out over the 
large `doExecute` function doesn't help understandability very much.



src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java (line 
158)


I will probably find this message confusing if I see it in the logs, in 
particular if I have configured the job with KILL_EXISTING.



src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java (line 
225)


The term "followup" is rather generic and difficuelt to understand if seen 
in isolation in the log. You should at least mention it is about killing a cron 
job.


- Stephan Erb


On Sept. 14, 2016, 2:23 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 2:23 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-13 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148836
---


Ship it!




Master (5069f93) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 14, 2016, 12:23 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 12:23 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-13 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
---

(Updated Sept. 14, 2016, 12:23 a.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebasing over 51759.


Repository: aurora


Description
---

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs (updated)
-

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-13 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148809
---



Master (633948a) is red with this patch.
  ./build-support/jenkins/build.sh


:processTestResources
:testClasses
:checkstyleTest
:findbugsJmh
:findbugsMain
:findbugsTest
:licenseJmh UP-TO-DATE
:licenseMain UP-TO-DATE
:licenseTest UP-TO-DATE
:license UP-TO-DATE
:pmdJmh
:pmdMain
:pmdTest
:test

org.apache.aurora.scheduler.BatchWorkerTest > testExecute FAILED
java.lang.AssertionError at BatchWorkerTest.java:70
I0913 23:42:25.162 [ShutdownHook, SchedulerMain:101] Stopping scheduler 
services. 

1040 tests completed, 1 failed, 2 skipped
:test FAILED
:jacocoTestReport
Coverage report generated: 
file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/jacoco/test/html/index.html
:analyzeReport
Instruction coverage is 0.8872287650631824, but must be greater than 0.89
Branch coverage is 0.7978208232445521, but must be greater than 0.835

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test'.
> There were failing tests. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/tests/index.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 4 mins 15.963 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 13, 2016, 11:14 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 13, 2016, 11:14 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-13 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
---

(Updated Sept. 13, 2016, 11:14 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebased over 51759.


Repository: aurora


Description
---

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs (updated)
-

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-09 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148356
---


Ship it!




Master (c7f710a) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 9, 2016, 6:13 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 9, 2016, 6:13 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
---

Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Repository: aurora


Description
---

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs
-

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko