Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149182
---


Ship it!




Ship It!

- Stephan Erb


On Sept. 15, 2016, 1:12 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 15, 2016, 1:12 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
---




src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (line 196)


Side show: Isn't that `if` unnecessary here and we can adjust the penality 
in any case? We will remove the group if `hasMore()` returns false, so any 
penality should be fine.



src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java (lines 
132 - 133)






src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java (lines 
197 - 202)


I would have expected that we only request preemption if we failed to 
schedule all tasks in the current group.  If I remember correctly, preemption 
slot search only happens on a per-group basis anyway.

You have probably thought about this, so I would like to understand your 
reasoning.



src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java (line 173)


Shouldn't that only happen if `launchTask` has succeeded?



src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java (lines 211 - 
217)


There is additional complexity here and in other places because `schedule`, 
`scheduleTask` and `maybeAssign` return a `Map` encoding if 
scheduling was successful for all given tasks.

I'd propose to change the signature so that all methods mentioned above 
return `Set` containing only the successfully launched tasks. 

If a caller is really interested in tasks failed to schedule, he can 
compute the set difference between passed and returned task ids.


- Stephan Erb


On Sept. 16, 2016, 2:51 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 2:51 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> NOTE: this will not apply cleanly as it branched off of 
> https://reviews.apache.org/r/51765, which itself depends on 
> https://reviews.apache.org/r/51759/.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   

Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Renan DelValle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/
---

(Updated Sept. 16, 2016, 1:49 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Adding a line on the change to the release notes.


Repository: aurora


Description
---

Change in the thrift API to make thee cronSchedule string in JobConfiguration 
an optional.


Diffs (updated)
-

  RELEASE-NOTES.md 411872b0244698b4ca74228bc21da608dcb98ae0 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
a045a21585fe40e63e2094fa103f205e7883eb35 

Diff: https://reviews.apache.org/r/51973/diff/


Testing
---

./build-support/jenkins/build.sh

./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Renan DelValle



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-16 Thread Joshua Cohen


> On Sept. 15, 2016, 10:58 a.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 166-167
> > 
> >
> > super nitpicky: mind swapping the order of these args to keep inline 
> > with `execute(Work work)`?
> 
> Maxim Khutornenko wrote:
> This ordering is deliberate to improve reading at the call site. The 
> verbose lambda is coming last in the `executeWithReplay` subjectively making 
> it easier to read (https://reviews.apache.org/r/51763/). If you disagree I 
> can reorder though.

Ahh, that makes sense. I'm fine to leave it as is.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review149044
---


On Sept. 16, 2016, 8:33 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 16, 2016, 8:33 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if interested.
> 
> #Upcoming
> The introduction of the `BatchWorker` by itself was not enough to 
> substantially improve the MTTA. It, however, paves the way for the next phase 
> of scheduling perf improvement - taking more than 1 task from a given 
> `TaskGroup` in a single scheduling round (coming soon). That improvement 
> wouldn't deliver without decreasing the lock contention first. 
> 
> Note: it wasn't easy to have a clean diff split, so some functionality in 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149290
---



@ReviewBot retry

- Maxim Khutornenko


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  a4e87d2216401f344dca64d69b945de7bcf8159a 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149294
---


Ship it!




Master (a87ad41) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>  

Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?

I am not comfortable substituting a `Void` with a random `Object` just to avoid 
analyzer error. This opens up a potential for much more confusing cases.


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, 
> > line 78
> > 
> >
> > Just to be clear, this annotation is needed to ensure the data in 
> > `JobDataMap` is persisted properly?

Correct.


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java, line 100
> > 
> >
> > For integration tests I was under the impression we used 
> > `org.apache.aurora.scheduler.testing.FakeStatsProvider`

The `FakeStatsProvider` is only useful when we want to get a reading on updated 
metrics. Since everything else that can post stats is mocked, there is no 
reason to use a real object here.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149279
---


Ship it!




Master (f1e09a9) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:04 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 16, 2016, 9:04 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread John Sirois


> On Sept. 15, 2016, 12:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?
> 
> Maxim Khutornenko wrote:
> I am not comfortable substituting a `Void` with a random `Object` just to 
> avoid analyzer error. This opens up a potential for much more confusing cases.
> 
> John Sirois wrote:
> Generally a `SideEffect` type with 1 instance would be handy for these 
> things.  Something to consider outside this review.
> 
> IE:
> ```java
> package org.apache.aurora.lang;
> 
> public enum SideEffect {
>   COMPLETE;
> }
> ```
> 
> I'm not sure if others would find `CompletableFuture` and the 
> corresponding `return SideEffect.COMPLETE` an improvement in these sorts of 
> situations.  It is explicit.
> 
> Maxim Khutornenko wrote:
> I don't think enum is useful here as it does not deliver any additional 
> signal. How about using a `NoResult` interface instance instead? Updated the 
> parent RB.

I used enum as the simplest way in java to ensure 1 instance, no extension, 
serializability etc.
The idea could be implemented differently, but the idea of naming side-effects 
is the primary thing... with the side-effect! bonus of no null rule exception 
needed.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 5:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 5:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
---

(Updated Sept. 16, 2016, 9:04 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebased.


Repository: aurora


Description
---

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs (updated)
-

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 16, 2016, 7:53 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Stephan's comments. The `TaskAssigner` now returns only assigned task IDs. The 
`TaskScheduler` returns both assigned AND those that should be removed from the 
`TaskGroup` (e.g.: not found or not PENDING).


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```

NOTE: this will not apply cleanly as it branched off of 
https://reviews.apache.org/r/51765, which itself depends on 
https://reviews.apache.org/r/51759/.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
9d0d40b82653fb923bed16d06546288a1576c21d 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
11e8033438ad0808e446e41bb26b3fa4c04136c7 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
c044ebe6f72183a67462bbd8e5be983eb592c3e9 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
d266f6a25ae2360db2977c43768a19b1f1efe8ff 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c2ceb4e7685a9301f8014a9183e02fbad65bca26 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
72562e6bd9a9860c834e6a9faa094c28600a8fed 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing

Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 16, 2016, 6:18 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 213
> > 
> >
> > I think this is a misuse of `CompletalbleFuture` I discovered this as I 
> > was reviewing https://reviews.apache.org/r/51765/.
> > 
> > I think here we need to use 
> > https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#completeExceptionally-java.lang.Throwable-
> >  here and set the exception into the future.
> > 
> > Then the exception will propagate to the caller (like 
> > `TaskGroupBatchWorker`).
> > 
> > Then the user of the batch worker can handle the exception.

Good catch. This is a leftover after refactoring `AwaitableResult` into 
`CompletableFuture` and it makes perfect sense to channel that error upstream.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review149227
---


On Sept. 14, 2016, 10:41 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 14, 2016, 10:41 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if interested.
> 
> #Upcoming
> The introduction of the `BatchWorker` by itself was not enough to 
> substantially improve the MTTA. It, however, paves the way for the next phase 
> of scheduling perf improvement - taking more than 

Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 15, 2016, 10:58 a.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 166-167
> > 
> >
> > super nitpicky: mind swapping the order of these args to keep inline 
> > with `execute(Work work)`?

This ordering is deliberate to improve reading at the call site. The verbose 
lambda is coming last in the `executeWithReplay` subjectively making it easier 
to read (https://reviews.apache.org/r/51763/). If you disagree I can reorder 
though.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review149044
---


On Sept. 16, 2016, 8:03 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 16, 2016, 8:03 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if interested.
> 
> #Upcoming
> The introduction of the `BatchWorker` by itself was not enough to 
> substantially improve the MTTA. It, however, paves the way for the next phase 
> of scheduling perf improvement - taking more than 1 task from a given 
> `TaskGroup` in a single scheduling round (coming soon). That improvement 
> wouldn't deliver without decreasing the lock contention first. 
> 
> Note: it wasn't easy to have a clean diff split, so some functionality in 
> `BatchWorker` (e.g.: `executeWithReplay`) appears to be unused in the current 
> patch but 

Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-16 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review149256
---


Ship it!




Ship It!

- Zameer Manji


On Sept. 16, 2016, 1:03 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 16, 2016, 1:03 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if interested.
> 
> #Upcoming
> The introduction of the `BatchWorker` by itself was not enough to 
> substantially improve the MTTA. It, however, paves the way for the next phase 
> of scheduling perf improvement - taking more than 1 task from a given 
> `TaskGroup` in a single scheduling round (coming soon). That improvement 
> wouldn't deliver without decreasing the lock contention first. 
> 
> Note: it wasn't easy to have a clean diff split, so some functionality in 
> `BatchWorker` (e.g.: `executeWithReplay`) appears to be unused in the current 
> patch but will become obvious in the part 2 (coming out shortly).  
> 
> [1] - 
> https://github.com/apache/aurora/blob/f6ac13b169aaad5aad73ef3cc72873781e30a705/src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java#L540-L555
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/BatchWorker.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/SchedulerModule.java 
> 4a7ef0b1b90607f68d89fe8e207f42c42a8c56a0 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 

Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/#review149266
---



Not that it changes much but since it's technically a schema change a release 
note would be great.

- Maxim Khutornenko


On Sept. 16, 2016, 8:36 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51973/
> ---
> 
> (Updated Sept. 16, 2016, 8:36 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change in the thrift API to make thee cronSchedule string in JobConfiguration 
> an optional.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> a045a21585fe40e63e2094fa103f205e7883eb35 
> 
> Diff: https://reviews.apache.org/r/51973/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149288
---



Master (496397a) is red with this patch.
  ./build-support/jenkins/build.sh


:processTestResources
:testClasses
:checkstyleTest
:findbugsJmhThe message received from the daemon indicates that the daemon has 
disappeared.
Build request sent: Build{id=04b1c215-9a35-4563-8eac-3fc8439bfb48.1, 
currentDir=/home/jenkins/jenkins-slave/workspace/AuroraBot}
Attempting to read last messages from the daemon log...
Daemon pid: 23279
  log file: /home/jenkins/.gradle/daemon/2.14/daemon-23279.out.log
- Last  20 lines from daemon log file - daemon-23279.out.log -
:distZip
:assemble
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources UP-TO-DATE
:jmhClasses
:checkstyleJmh
:jsHint
:checkstyleMain
:compileTestJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java:38:
 Note: Wrote forwarder 
org.apache.aurora.scheduler.thrift.aop.MockDecoratedThriftForwarder
@Forward(AnnotatedAuroraAdmin.class)
^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processTestResources
:testClasses
:checkstyleTest
:findbugsJmhDaemon vm is shutting down... The daemon has exited normally or was 
terminated in response to a user interrupt.
- End of the daemon log -


FAILURE: Build failed with an exception.

* What went wrong:
Gradle build daemon disappeared unexpectedly (it may have been killed or may 
have crashed)

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 197
> > 
> >
> > Side show: Isn't that `if` unnecessary here and we can adjust the 
> > penality in any case? We will remove the group if `hasMore()` returns 
> > false, so any penality should be fine.

Not sure I follow. This is the place that applies penalty accrued inside the 
`startGroup()` or removes the group if it's empty.


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 208-213
> > 
> >
> > I would have expected that we only request preemption if we failed to 
> > schedule all tasks in the current group.  If I remember correctly, 
> > preemption slot search only happens on a per-group basis anyway.
> > 
> > You have probably thought about this, so I would like to understand 
> > your reasoning.

We still want to attempt a preemption if _some_ but not _all_ tasks are 
scheduled within a given round, right? Otherwise, preemption becomes an 
all-or-nothing feature and we have to wait for another scheduling cycle to 
request a reservation.


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 174
> > 
> >
> > Shouldn't that only happen if `launchTask` has succeeded?

I've debated that as well and decided it's more logical to finish accessing 
offer details before it's being launched with and removed from the 
`OfferManager`.

BTW, I just realized I was missing a `break` statement to bail out from the 
scheduling round if a `LaunchException` is thrown. While theoretically we 
_could_ continue matching even after the `LaunchException`, it's hard to reason 
about the state of the storage and the last assigned item and as such it's 
safer to terminate than continue. Fixed and added a test case. This should now 
be resolved. Thanks for asking :)


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, lines 
> > 214-220
> > 
> >
> > There is additional complexity here and in other places because 
> > `schedule`, `scheduleTask` and `maybeAssign` return a `Map > Boolean>` encoding if scheduling was successful for all given tasks.
> > 
> > I'd propose to change the signature so that all methods mentioned above 
> > return `Set` containing only the successfully launched tasks. 
> > 
> > If a caller is really interested in tasks failed to schedule, he can 
> > compute the set difference between passed and returned task ids.

I was fighting with myself over this initially and decided to keep the original 
contract. Now that the aproach fully shaped up, I can see how it can be 
simplified by not returning the exhaustive map of things. Refactored to return 
only what's needed. Thanks for bringing this up!


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
---


On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 12:51 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers 

Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/#review149272
---


Ship it!




Master (783baae) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 8:49 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51973/
> ---
> 
> (Updated Sept. 16, 2016, 8:49 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change in the thrift API to make thee cronSchedule string in JobConfiguration 
> an optional.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 411872b0244698b4ca74228bc21da608dcb98ae0 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> a045a21585fe40e63e2094fa103f205e7883eb35 
> 
> Diff: https://reviews.apache.org/r/51973/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 51874: Change framework_name default value from 'TwitterScheduler' to 'Aurora'

2016-09-16 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51874/
---

(Updated Sept. 16, 2016, 2:50 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Rebasing.


Bugs: AURORA-1688
https://issues.apache.org/jira/browse/AURORA-1688


Repository: aurora


Description
---

Change framework_name default value from 'TwitterScheduler' to 'Aurora'


Diffs (updated)
-

  RELEASE-NOTES.md 83f1ca2aee5b87b58f88c5f272e6d56ab9f6cbd0 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 8a386bd208956eb0c8c2f48874b0c6fb3af58872 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
97677f24a50963178a123b420d7ac136e4fde3fe 

Diff: https://reviews.apache.org/r/51874/diff/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh

Testing to make sure backward compatibility:

# HEAD of master:

# Case 1: Rolling forward does not impact running tasks:
Renaming framework from 'TwitterScheduler' to 'Aurora':

The framework re-registers after restart (treated by master as failover) and 
gets the same framework-id. Running task remain unaffected.

Master log:
I0914 16:48:28.408182  9815 master.cpp:1297] Giving framework 
071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to 
failover
I0914 16:48:28.408226  9815 hierarchical.cpp:382] Deactivated framework 
071c44a1-b4d4-4339-a727-03a79f725851-
E0914 16:48:28.408617  9819 process.cpp:2105] Failed to shutdown socket with fd 
28: Transport endpoint is not connected
I0914 16:48:43.722126  9813 master.cpp:2424] Received SUBSCRIBE call for 
framework 'Aurora' at 
scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
I0914 16:48:43.722190  9813 master.cpp:2500] Subscribing framework Aurora with 
checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, GPU_RESOURCES ]
I0914 16:48:43.75  9813 master.cpp:2564] Updating info for framework 
071c44a1-b4d4-4339-a727-03a79f725851-
I0914 16:48:43.722256  9813 master.cpp:2577] Framework 
071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over
I0914 16:48:43.722429  9813 hierarchical.cpp:348] Activated framework 
071c44a1-b4d4-4339-a727-03a79f725851-
I0914 16:48:43.722595  9813 master.cpp:5709] Sending 1 offers to framework 
071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083

Scheduler log:
I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID 
value: "071c44a1-b4d4-4339-a727-03a79f725851-"
, master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae"
ip: 119646400
port: 5050
pid: "master@192.168.33.7:5050"
hostname: "aurora.local"
version: "1.0.0"
address {
  hostname: "aurora.local"
  ip: "192.168.33.7"
  port: 5050
}

# Case 2: Rolling backward does not impact running tasks:
Rolling back framework name from 'Aurora' to 'TwitterScheduler':

The framework re-registers after restart (treated by master as failover) and 
gets the same framework-id. Running task remain unaffected.

Master log:
I0914 16:51:33.203495  9812 master.cpp:1297] Giving framework 
071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to 
failover
I0914 16:51:33.203526  9812 hierarchical.cpp:382] Deactivated framework 
071c44a1-b4d4-4339-a727-03a79f725851-
I0914 16:51:49.614074  9813 master.cpp:2424] Received SUBSCRIBE call for 
framework 'TwitterScheduler' at 
scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083
I0914 16:51:49.614215  9813 master.cpp:2500] Subscribing framework 
TwitterScheduler with checkpointing enabled and capabilities [ 
REVOCABLE_RESOURCES, GPU_RESOURCES ]
I0914 16:51:49.614312  9813 master.cpp:2564] Updating info for framework 
071c44a1-b4d4-4339-a727-03a79f725851-
I0914 16:51:49.614359  9813 master.cpp:2577] Framework 
071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 failed over
I0914 16:51:49.614977  9813 hierarchical.cpp:348] Activated framework 
071c44a1-b4d4-4339-a727-03a79f725851-
I0914 16:51:49.615170  9813 master.cpp:5709] Sending 1 offers to framework 
071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083

Scheduler log:
I0914 16:51:50.249 [Thread-10, MesosSchedulerImpl:151] Registered with ID 
value: "071c44a1-b4d4-4339-a727-03a79f725851-"
, master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae"
ip: 119646400
port: 5050
pid: "master@192.168.33.7:5050"
hostname: "aurora.local"
version: "1.0.0"
address {
  hostname: "aurora.local"
  ip: 

Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 16, 2016, 6:28 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java, 
> > line 91
> > 
> >
> > Curious, arbitrary or derrived?

Arbitrary derived :) There is no perfect default here though. Just trying to be 
conservative especially given the upcoming https://reviews.apache.org/r/51929/. 
After all is done, we will have 15x improvement in scheduling rate derived from 
a single `write()` transaction.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/#review149229
---


On Sept. 14, 2016, 11:18 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51765/
> ---
> 
> (Updated Sept. 14, 2016, 11:18 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the final part of the `BatchWorker` conversion work that converts 
> `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background 
> on the `BatchWorker`.
> 
> #Problem
> See https://reviews.apache.org/r/51759
> 
> #Remediation
> Task scheduling is one of the most dominant users of the write lock. It's 
> also one of the heaviest and the most latency-sensitive. As such, the default 
> max batch size is chosen conservatively low (3) and batch items are executed 
> in a blocking way. 
> 
> BTW, attempting to make task scheduling non-blocking resulted in a much worse 
> scheduling performance. The way our `DBTaskStore` is wired, all async 
> activities, including `EventBus` are bound to use a single async `Executor`, 
> which is currently limited at 8 threads [1]. Relying on the same `EventBus` 
> to deliver scheduling completion events resulted in slower scheduling perf as 
> those events were backed up behind all other activities, including tasks 
> status events, reconciliation and etc. Increasing the executor thread pool 
> size to a larger number on the other side, also increased the lock contention 
> defeating the whole purpose of this work.
> 
> #Results
> See https://reviews.apache.org/r/51759 for the lock contention results.
> 
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 11e8033438ad0808e446e41bb26b3fa4c04136c7 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> c044ebe6f72183a67462bbd8e5be983eb592c3e9 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> d266f6a25ae2360db2977c43768a19b1f1efe8ff 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c2ceb4e7685a9301f8014a9183e02fbad65bca26 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  72562e6bd9a9860c834e6a9faa094c28600a8fed 
> 
> Diff: https://reviews.apache.org/r/51765/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/
---

(Updated Sept. 16, 2016, 8:33 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Adding `NoResult` to substitute `Void` results.


Repository: aurora


Description
---

This is the first (out of 3) patches intending to reduce storage write lock 
contention and as such improve overall system write throughput. It introduces 
the `BatchWorker` and migrates the majority of storage writes due to task 
status change events to use `TaskEventBatchWorker`.

#Problem
Our current storage system writes effectively behave as `SERIALIZABLE` 
transaction isolation level in SQL terms. This means all writes require 
exclusive access to the storage and no two transactions can happen in parallel 
[1]. While it certainly simplifies our implementation, it creates a single 
hotspot where multiple threads are competing for the storage write access. This 
type of contention only worsens as the cluster size grows, more tasks are 
scheduled, more status updates are processed, more subscribers are listening to 
status updates and etc. Eventually, the scheduler throughput (and especially 
task scheduling) becomes degraded to the extent that certain operations wait 
much longer (4x and more) for the lock acquisition than it takes to process 
their payload when inside the transaction. Some ops (like event processing) are 
generally tolerant of these types of delays. Others - not as much. The task 
scheduling suffers the most as backing up the scheduling queue directly affects 
t
 he Median Time To Assigned (MTTA).

#Remediation
Given the above, it's natural to assume that reducing the number of write 
transactions should help reducing the lock contention. This patch introduces a 
generic `BatchWorker` service that delivers a "best effort" batching approach 
by redirecting multiple individual write requests into a single FIFO queue 
served non-stop by a single dedicated thread. Every batch shares a single write 
transaction thus reducing the number of potential write lock requests. To 
minimize wait-in-queue time, items are dispatched immediately and the max 
number of items is bounded. There are a few `BatchWorker` instances specialized 
on particular workload types: task even processing, cron scheduling and task 
scheduling. Every instance can be tuned independently (max batch size) and 
provides specialized metrics helping to monitor each workload type perf.

#Results
The proposed approach has been heavily tested in production and delivered the 
best results. The lock contention latencies got down between 2x and 5x 
depending on the cluster load. A number of other approaches tried but discarded 
as not performing well or even performing much worse than the current master:
- Clock-driven batch execution - every batch is dispatched on a time schedule
- Max batch with a deadline - a batch is dispatched when max size is reached OR 
a timeout expires
- Various combinations of the above - some `BatchWorkers` are using 
clock-driven execution while others are using max batch with a deadline
- Completely non-blocking (event-based) completion notification - all call 
sites are notified of item completion via a `BatchWorkCompleted` event

Happy to provide more details on the above if interested.

#Upcoming
The introduction of the `BatchWorker` by itself was not enough to substantially 
improve the MTTA. It, however, paves the way for the next phase of scheduling 
perf improvement - taking more than 1 task from a given `TaskGroup` in a single 
scheduling round (coming soon). That improvement wouldn't deliver without 
decreasing the lock contention first. 

Note: it wasn't easy to have a clean diff split, so some functionality in 
`BatchWorker` (e.g.: `executeWithReplay`) appears to be unused in the current 
patch but will become obvious in the part 2 (coming out shortly).  

[1] - 
https://github.com/apache/aurora/blob/f6ac13b169aaad5aad73ef3cc72873781e30a705/src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java#L540-L555


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/BatchWorker.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/SchedulerModule.java 
4a7ef0b1b90607f68d89fe8e207f42c42a8c56a0 
  src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
f07746c2b990c1c2235e99f9e4775fc84f9c27b1 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskThrottler.java 
bbd971a2aa8a96cf79edd879ad60e1bebd933d79 
  src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
3c7cda09ab292d696070ca4d9dfedb1f6f71b0fe 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
594bb6219294dcc77d48dcad14e2a6f9caa0c534 
  

Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/#review149273
---


Ship it!




Ship It!

- Joshua Cohen


On Sept. 16, 2016, 8:49 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51973/
> ---
> 
> (Updated Sept. 16, 2016, 8:49 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change in the thrift API to make thee cronSchedule string in JobConfiguration 
> an optional.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 411872b0244698b4ca74228bc21da608dcb98ae0 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> a045a21585fe40e63e2094fa103f205e7883eb35 
> 
> Diff: https://reviews.apache.org/r/51973/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149275
---


Ship it!




Ship It!

- Joshua Cohen


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51874: Change framework_name default value from 'TwitterScheduler' to 'Aurora'

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51874/#review149286
---


Ship it!




Master (496397a) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:50 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51874/
> ---
> 
> (Updated Sept. 16, 2016, 9:50 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1688
> https://issues.apache.org/jira/browse/AURORA-1688
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change framework_name default value from 'TwitterScheduler' to 'Aurora'
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 83f1ca2aee5b87b58f88c5f272e6d56ab9f6cbd0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  8a386bd208956eb0c8c2f48874b0c6fb3af58872 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> 97677f24a50963178a123b420d7ac136e4fde3fe 
> 
> Diff: https://reviews.apache.org/r/51874/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Testing to make sure backward compatibility:
> 
> # HEAD of master:
> 
> # Case 1: Rolling forward does not impact running tasks:
> Renaming framework from 'TwitterScheduler' to 'Aurora':
> 
> The framework re-registers after restart (treated by master as failover) and 
> gets the same framework-id. Running task remain unaffected.
> 
> Master log:
> I0914 16:48:28.408182  9815 master.cpp:1297] Giving framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
> scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to 
> failover
> I0914 16:48:28.408226  9815 hierarchical.cpp:382] Deactivated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> E0914 16:48:28.408617  9819 process.cpp:2105] Failed to shutdown socket with 
> fd 28: Transport endpoint is not connected
> I0914 16:48:43.722126  9813 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'Aurora' at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
> I0914 16:48:43.722190  9813 master.cpp:2500] Subscribing framework Aurora 
> with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, 
> GPU_RESOURCES ]
> I0914 16:48:43.75  9813 master.cpp:2564] Updating info for framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:48:43.722256  9813 master.cpp:2577] Framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
> scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over
> I0914 16:48:43.722429  9813 hierarchical.cpp:348] Activated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:48:43.722595  9813 master.cpp:5709] Sending 1 offers to framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
> 
> Scheduler log:
> I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID 
> value: "071c44a1-b4d4-4339-a727-03a79f725851-"
> , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae"
> ip: 119646400
> port: 5050
> pid: "master@192.168.33.7:5050"
> hostname: "aurora.local"
> version: "1.0.0"
> address {
>   hostname: "aurora.local"
>   ip: "192.168.33.7"
>   port: 5050
> }
> 
> # Case 2: Rolling backward does not impact running tasks:
> Rolling back framework name from 'Aurora' to 'TwitterScheduler':
> 
> The framework re-registers after restart (treated by master as failover) and 
> gets the same framework-id. Running task remain unaffected.
> 
> Master log:
> I0914 16:51:33.203495  9812 master.cpp:1297] Giving framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to 
> failover
> I0914 16:51:33.203526  9812 hierarchical.cpp:382] Deactivated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:51:49.614074  9813 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'TwitterScheduler' at 
> scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083
> I0914 16:51:49.614215  9813 master.cpp:2500] Subscribing framework 
> TwitterScheduler with checkpointing enabled and capabilities [ 
> REVOCABLE_RESOURCES, GPU_RESOURCES ]
> I0914 16:51:49.614312  9813 master.cpp:2564] Updating info for framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:51:49.614359  9813 master.cpp:2577] Framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
> 

Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread John Sirois


> On Sept. 15, 2016, 12:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?
> 
> Maxim Khutornenko wrote:
> I am not comfortable substituting a `Void` with a random `Object` just to 
> avoid analyzer error. This opens up a potential for much more confusing cases.

Generally a `SideEffect` type with 1 instance would be handy for these things.  
Something to consider outside this review.

IE:
```java
package org.apache.aurora.lang;

public enum SideEffect {
  COMPLETE;
}
```

I'm not sure if others would find `CompletableFuture` and the 
corresponding `return SideEffect.COMPLETE` an improvement in these sorts of 
situations.  It is explicit.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 5:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 5:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Renan DelValle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/
---

Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Repository: aurora


Description
---

Change in the thrift API to make thee cronSchedule string in JobConfiguration 
an optional.


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
a045a21585fe40e63e2094fa103f205e7883eb35 

Diff: https://reviews.apache.org/r/51973/diff/


Testing
---

./build-support/jenkins/build.sh

./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Renan DelValle



Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Renan DelValle


> On Sept. 16, 2016, 1:42 p.m., Maxim Khutornenko wrote:
> > Not that it changes much but since it's technically a schema change a 
> > release note would be great.

Doh! Can't believe I forgot. Coming right up.


- Renan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/#review149266
---


On Sept. 16, 2016, 1:36 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51973/
> ---
> 
> (Updated Sept. 16, 2016, 1:36 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change in the thrift API to make thee cronSchedule string in JobConfiguration 
> an optional.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> a045a21585fe40e63e2094fa103f205e7883eb35 
> 
> Diff: https://reviews.apache.org/r/51973/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 51980: Refactor of Webhook and no longer posting entire task state via webhook on scheduler restart

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51980/#review149311
---


Ship it!




Master (a87ad41) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 17, 2016, 12:48 a.m., Dmitriy Shirchenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51980/
> ---
> 
> (Updated Sept. 17, 2016, 12:48 a.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Zameer Manji.
> 
> 
> Bugs: AURORA-1772
> https://issues.apache.org/jira/browse/AURORA-1772
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is a refactor with addition of HttpClient to use more lower level 
> HtttpURLConnection objects to inject the client into Webhook class. 
> Additionally due to peformance issues, it is unncessary to POST entire task 
> state to webhook endpoint on every scheduler restart.
> 
> 
> Diffs
> -
> 
>   build.gradle d5a3a7a3bdb8349de6fc01d4a6271b32d942e531 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> e54aa19d67278fcb5586f07c399f09062f845a18 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookInfo.java 
> e4193f7518e66d210a6d5aae22a9f04e2d4984b3 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> eaa70533a64740c74cebf15739b7142e2d3a4d11 
>   src/main/resources/org/apache/aurora/scheduler/webhook.json 
> 00787985510d7d415b8074bef06d28b635c78b09 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 488eefd14c3e67a41a75c809397c8d19f83cc08a 
> 
> Diff: https://reviews.apache.org/r/51980/diff/
> 
> 
> Testing
> ---
> 
> Part of reason for refactor is to allow easier testing so cleaner unit tests 
> were added with more code coverage.
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>



Re: Review Request 51980: Refactor of Webhook and no longer posting entire task state via webhook on scheduler restart

2016-09-16 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51980/
---

(Updated Sept. 17, 2016, 1:40 a.m.)


Review request for Aurora, Maxim Khutornenko and Zameer Manji.


Bugs: AURORA-1772
https://issues.apache.org/jira/browse/AURORA-1772


Repository: aurora


Description (updated)
---

This is a refactor with addition of HttpClient injected into Webhook class as 
opposed to previous usage of lower level HtttpURLConnection objects. 
Additionally due to peformance issues, it is unncessary to POST entire task 
state to webhook endpoint on every scheduler restart so that is removed in this 
patch.


Diffs
-

  build.gradle d5a3a7a3bdb8349de6fc01d4a6271b32d942e531 
  src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
e54aa19d67278fcb5586f07c399f09062f845a18 
  src/main/java/org/apache/aurora/scheduler/events/WebhookInfo.java 
e4193f7518e66d210a6d5aae22a9f04e2d4984b3 
  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
eaa70533a64740c74cebf15739b7142e2d3a4d11 
  src/main/resources/org/apache/aurora/scheduler/webhook.json 
00787985510d7d415b8074bef06d28b635c78b09 
  src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
488eefd14c3e67a41a75c809397c8d19f83cc08a 

Diff: https://reviews.apache.org/r/51980/diff/


Testing
---

Part of reason for refactor is to allow easier testing so cleaner unit tests 
were added with more code coverage.


Thanks,

Dmitriy Shirchenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149260
---



Master (783baae) is red with this patch.
  ./build-support/jenkins/build.sh

 # Create file stdout for capturing output. We 
can't use StringIO mock
 # because TestProcess is running fork.
 with open(os.path.join(td, 'sys_stdout'), 
'w+') as stdout:
   with open(os.path.join(td, 'sys_stderr'), 
'w+') as stderr:
 with mutable_sys():
   sys.stdout, sys.stderr = stdout, 
stderr
 
   p = TestProcess('process', 'echo hello 
world; echo >&2 hello stderr', 0,
   taskpath, sandbox, 
logger_destination=LoggerDestination.BOTH)
   p.start()
   rc = 
wait_for_rc(taskpath.getpath('process_checkpoint'))
 
   assert rc == 0
   # Check log files were created in std 
path with correct content
 > assert_log_content(taskpath, 'stdout', 
'hello world\n')
 
 src/test/python/apache/thermos/core/test_process.py:487: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 
 taskpath = 
 log_name = 'stdout'
 expected_content = 'hello world\n'
 
 def assert_log_content(taskpath, log_name, 
expected_content):
   log = 
taskpath.with_filename(log_name).getpath('process_logdir')
   assert os.path.exists(log)
   with open(log, 'r') as fp:
 >   assert fp.read() == expected_content
 E   assert '' == 'hello world\n'
 E + hello world
 
 src/test/python/apache/thermos/core/test_process.py:313: 
AssertionError
  generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml
 
  1 failed, 710 passed, 6 skipped, 1 warnings in 
381.34 seconds 
 
FAILURE


20:19:36 07:12   [complete]
   FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 7:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 7:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only 

Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote:
> > config/findbugs/excludeFilter.xml, line 123
> > 
> >
> > If I'm reading the code correctly we need this because we have a 
> > `CompletableFuture` and to give it a value we use `null`.
> > 
> > Have you considered using `new Object()` instead so we don't have to 
> > add this exception to our rules?
> 
> Maxim Khutornenko wrote:
> I am not comfortable substituting a `Void` with a random `Object` just to 
> avoid analyzer error. This opens up a potential for much more confusing cases.
> 
> John Sirois wrote:
> Generally a `SideEffect` type with 1 instance would be handy for these 
> things.  Something to consider outside this review.
> 
> IE:
> ```java
> package org.apache.aurora.lang;
> 
> public enum SideEffect {
>   COMPLETE;
> }
> ```
> 
> I'm not sure if others would find `CompletableFuture` and the 
> corresponding `return SideEffect.COMPLETE` an improvement in these sorts of 
> situations.  It is explicit.

I don't think enum is useful here as it does not deliver any additional signal. 
How about using a `NoResult` interface instance instead? Updated the parent RB.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/#review149093
---


On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51763/
> ---
> 
> (Updated Sept. 14, 2016, 11:12 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the second part of the `BatchWorker` conversion work that moves cron 
> jobs to use non-blocking kill followups and reduces the number of trigger 
> threads. See https://reviews.apache.org/r/51759 for more background on the 
> `BatchWorker`.
> 
> #Problem
> The current implementation of the cron scheduling relies on a large number of 
> threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
> killing existing tasks according to `KILL_EXISTING` collision policy. This 
> creates large spikes of activities at synchronized intervals as users tend to 
> schedule their cron runs around similar schedules. Moreover, the current 
> implementation re-acquires write locks multiple times to deliver on 
> `KILL_EXISTING` policy. 
> 
> #Remediation
> Trigger level batching is still done in a blocking way but multiple cron 
> triggers may be bundled together to share the same write transaction. Any 
> followups, however, are performed in a non-blocking way by relying on a 
> `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. 
> In order to still ensure non-concurrent execution of a given job key trigger, 
> a token (job key) is saved within the trigger itself. A concurrent trigger 
> will bail if a kill followup is still in progress (token is set AND no entry 
> in `killFollowups` set exists yet).
> 
> #Results
> The above approach allowed reducing the number of cron threads to 10 and 
> likely can be reduced even further. See https://reviews.apache.org/r/51759 
> for the lock contention results.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
> 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
>   commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
> bc30990d57f444f7d64805ed85c363f1302736d0 
>   config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
> c07551e94f9221b5b21c5dc9715e82caa290c2e8 
>   src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
> 155d702d68367b247dd066f773c662407f0e3b5b 
>   
> src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
> 5c64ff2994e200b3453603ac5470e8e152cebc55 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 
> 
> Diff: https://reviews.apache.org/r/51763/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 16, 2016, 9:53 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebased over master.


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```

NOTE: this will not apply cleanly as it branched off of 
https://reviews.apache.org/r/51765, which itself depends on 
https://reviews.apache.org/r/51759/.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
6f1cbfbc4510a037cffc95fee54f62f463d2b534 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
664bc6cf964ede2473a4463e58bcdbcb65bc7413 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
d390c07522d22e43d79ce4370985f3643ef021ca 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
207d38d1ddfd373892602218a98c1daaf4a1325f 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
88729626de5fa87b45472792c59cc0ff1ade3e93 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
a4e87d2216401f344dca64d69b945de7bcf8159a 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 16, 2016, 9:53 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Repository: aurora


Description (updated)
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```


Diffs
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
6f1cbfbc4510a037cffc95fee54f62f463d2b534 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
664bc6cf964ede2473a4463e58bcdbcb65bc7413 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
d390c07522d22e43d79ce4370985f3643ef021ca 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
207d38d1ddfd373892602218a98c1daaf4a1325f 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
88729626de5fa87b45472792c59cc0ff1ade3e93 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
a4e87d2216401f344dca64d69b945de7bcf8159a 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51973/#review149271
---


Ship it!




Ship It!

- Maxim Khutornenko


On Sept. 16, 2016, 8:49 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51973/
> ---
> 
> (Updated Sept. 16, 2016, 8:49 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change in the thrift API to make thee cronSchedule string in JobConfiguration 
> an optional.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 411872b0244698b4ca74228bc21da608dcb98ae0 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> a045a21585fe40e63e2094fa103f205e7883eb35 
> 
> Diff: https://reviews.apache.org/r/51973/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.

2016-09-16 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/#review149277
---


Ship it!




Ship It!

- Joshua Cohen


On Sept. 14, 2016, 11:18 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51765/
> ---
> 
> (Updated Sept. 14, 2016, 11:18 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the final part of the `BatchWorker` conversion work that converts 
> `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background 
> on the `BatchWorker`.
> 
> #Problem
> See https://reviews.apache.org/r/51759
> 
> #Remediation
> Task scheduling is one of the most dominant users of the write lock. It's 
> also one of the heaviest and the most latency-sensitive. As such, the default 
> max batch size is chosen conservatively low (3) and batch items are executed 
> in a blocking way. 
> 
> BTW, attempting to make task scheduling non-blocking resulted in a much worse 
> scheduling performance. The way our `DBTaskStore` is wired, all async 
> activities, including `EventBus` are bound to use a single async `Executor`, 
> which is currently limited at 8 threads [1]. Relying on the same `EventBus` 
> to deliver scheduling completion events resulted in slower scheduling perf as 
> those events were backed up behind all other activities, including tasks 
> status events, reconciliation and etc. Increasing the executor thread pool 
> size to a larger number on the other side, also increased the lock contention 
> defeating the whole purpose of this work.
> 
> #Results
> See https://reviews.apache.org/r/51759 for the lock contention results.
> 
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 11e8033438ad0808e446e41bb26b3fa4c04136c7 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> c044ebe6f72183a67462bbd8e5be983eb592c3e9 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> d266f6a25ae2360db2977c43768a19b1f1efe8ff 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c2ceb4e7685a9301f8014a9183e02fbad65bca26 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  72562e6bd9a9860c834e6a9faa094c28600a8fed 
> 
> Diff: https://reviews.apache.org/r/51765/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51874: Change framework_name default value from 'TwitterScheduler' to 'Aurora'

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51874/#review149216
---


Ship it!




Ship It!

- Maxim Khutornenko


On Sept. 15, 2016, 7:02 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51874/
> ---
> 
> (Updated Sept. 15, 2016, 7:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1688
> https://issues.apache.org/jira/browse/AURORA-1688
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Change framework_name default value from 'TwitterScheduler' to 'Aurora'
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  8a386bd208956eb0c8c2f48874b0c6fb3af58872 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> 97677f24a50963178a123b420d7ac136e4fde3fe 
> 
> Diff: https://reviews.apache.org/r/51874/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Testing to make sure backward compatibility:
> 
> # HEAD of master:
> 
> # Case 1: Rolling forward does not impact running tasks:
> Renaming framework from 'TwitterScheduler' to 'Aurora':
> 
> The framework re-registers after restart (treated by master as failover) and 
> gets the same framework-id. Running task remain unaffected.
> 
> Master log:
> I0914 16:48:28.408182  9815 master.cpp:1297] Giving framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
> scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to 
> failover
> I0914 16:48:28.408226  9815 hierarchical.cpp:382] Deactivated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> E0914 16:48:28.408617  9819 process.cpp:2105] Failed to shutdown socket with 
> fd 28: Transport endpoint is not connected
> I0914 16:48:43.722126  9813 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'Aurora' at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
> I0914 16:48:43.722190  9813 master.cpp:2500] Subscribing framework Aurora 
> with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, 
> GPU_RESOURCES ]
> I0914 16:48:43.75  9813 master.cpp:2564] Updating info for framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:48:43.722256  9813 master.cpp:2577] Framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
> scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over
> I0914 16:48:43.722429  9813 hierarchical.cpp:348] Activated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:48:43.722595  9813 master.cpp:5709] Sending 1 offers to framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
> 
> Scheduler log:
> I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID 
> value: "071c44a1-b4d4-4339-a727-03a79f725851-"
> , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae"
> ip: 119646400
> port: 5050
> pid: "master@192.168.33.7:5050"
> hostname: "aurora.local"
> version: "1.0.0"
> address {
>   hostname: "aurora.local"
>   ip: "192.168.33.7"
>   port: 5050
> }
> 
> # Case 2: Rolling backward does not impact running tasks:
> Rolling back framework name from 'Aurora' to 'TwitterScheduler':
> 
> The framework re-registers after restart (treated by master as failover) and 
> gets the same framework-id. Running task remain unaffected.
> 
> Master log:
> I0914 16:51:33.203495  9812 master.cpp:1297] Giving framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to 
> failover
> I0914 16:51:33.203526  9812 hierarchical.cpp:382] Deactivated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:51:49.614074  9813 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'TwitterScheduler' at 
> scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083
> I0914 16:51:49.614215  9813 master.cpp:2500] Subscribing framework 
> TwitterScheduler with checkpointing enabled and capabilities [ 
> REVOCABLE_RESOURCES, GPU_RESOURCES ]
> I0914 16:51:49.614312  9813 master.cpp:2564] Updating info for framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> I0914 16:51:49.614359  9813 master.cpp:2577] Framework 
> 071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at 
> scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 failed over
> I0914 16:51:49.614977  9813 hierarchical.cpp:348] Activated framework 
> 071c44a1-b4d4-4339-a727-03a79f725851-
> 

Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-16 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review149227
---




src/main/java/org/apache/aurora/scheduler/BatchWorker.java (line 213)


I think this is a misuse of `CompletalbleFuture` I discovered this as I was 
reviewing https://reviews.apache.org/r/51765/.

I think here we need to use 
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#completeExceptionally-java.lang.Throwable-
 here and set the exception into the future.

Then the exception will propagate to the caller (like 
`TaskGroupBatchWorker`).

Then the user of the batch worker can handle the exception.


- Zameer Manji


On Sept. 14, 2016, 3:41 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 14, 2016, 3:41 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if interested.
> 
> #Upcoming
> The introduction of the `BatchWorker` by itself was not enough to 
> substantially improve the MTTA. It, however, paves the way for the next phase 
> of scheduling perf improvement - taking more than 1 task from a given 
> `TaskGroup` in a single scheduling round (coming soon). That improvement 
> wouldn't deliver without decreasing the lock contention first. 
> 
> Note: it wasn't easy to have a clean diff split, so some functionality in 
> `BatchWorker` (e.g.: 

Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.

2016-09-16 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/#review149229
---


Ship it!




LGTM.


src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
(line 91)


Curious, arbitrary or derrived?


- Zameer Manji


On Sept. 14, 2016, 4:18 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51765/
> ---
> 
> (Updated Sept. 14, 2016, 4:18 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the final part of the `BatchWorker` conversion work that converts 
> `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background 
> on the `BatchWorker`.
> 
> #Problem
> See https://reviews.apache.org/r/51759
> 
> #Remediation
> Task scheduling is one of the most dominant users of the write lock. It's 
> also one of the heaviest and the most latency-sensitive. As such, the default 
> max batch size is chosen conservatively low (3) and batch items are executed 
> in a blocking way. 
> 
> BTW, attempting to make task scheduling non-blocking resulted in a much worse 
> scheduling performance. The way our `DBTaskStore` is wired, all async 
> activities, including `EventBus` are bound to use a single async `Executor`, 
> which is currently limited at 8 threads [1]. Relying on the same `EventBus` 
> to deliver scheduling completion events resulted in slower scheduling perf as 
> those events were backed up behind all other activities, including tasks 
> status events, reconciliation and etc. Increasing the executor thread pool 
> size to a larger number on the other side, also increased the lock contention 
> defeating the whole purpose of this work.
> 
> #Results
> See https://reviews.apache.org/r/51759 for the lock contention results.
> 
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 11e8033438ad0808e446e41bb26b3fa4c04136c7 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> c044ebe6f72183a67462bbd8e5be983eb592c3e9 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> d266f6a25ae2360db2977c43768a19b1f1efe8ff 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c2ceb4e7685a9301f8014a9183e02fbad65bca26 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  72562e6bd9a9860c834e6a9faa094c28600a8fed 
> 
> Diff: https://reviews.apache.org/r/51765/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>