> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > lines 164-172
> > <https://reviews.apache.org/r/51763/diff/3/?file=1498135#file1498135line164>
> >
> >     I probably don't completely understand the logic, so stupid question 
> > ahead:
> >     
> >     Couldn't we do that cleanup of JobDataMap in the handler where you 
> > currently do the `killFollowups.add(key)`? Having this spread out over the 
> > large `doExecute` function doesn't help understandability very much.

Unfortunately, we can't. That was my first thought but the context and its 
detail map are reconstructed for every trigger and are gone the moment a 
trigger is released.


> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > line 174
> > <https://reviews.apache.org/r/51763/diff/3/?file=1498135#file1498135line174>
> >
> >     I will probably find this message confusing if I see it in the logs, in 
> > particular if I have configured the job with KILL_EXISTING.

Any suggestions? There are lots of moving parts addressed in this message, so I 
am not sure how to better explain it without describing how quartz works :)


> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > line 274
> > <https://reviews.apache.org/r/51763/diff/3/?file=1498135#file1498135line274>
> >
> >     The term "followup" is rather generic and difficuelt to understand if 
> > seen in isolation in the log. You should at least mention it is about 
> > killing a cron job.

Good point. Rephrased.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review148933
-----------------------------------------------------------


On Sept. 14, 2016, 12:23 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2016, 12:23 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #####Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #####Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #####Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -----
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> -------
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to