Re: Review Request 51929: Scheduling multiple tasks per round.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/ --- (Updated Sept. 27, 2016, 6:06 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Joshua's comments. Repository: aurora Description --- This is phase 2 of scheduling perf improvement effort started in https://reviews.apache.org/r/51759/. We can now take multiple (configurable) number of task IDs from a given `TaskGroup` per scheduling. The idea is to go deeper through the offer queue and assign more than one task if possible. This approach delivers substantially better MTTA and still ensures fairness across multiple `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be set even higher if the majority of cluster jobs have large number of instances and/or update batch sizes. As far as a single round perf goes, we can consider the following 2 worst-case scenarios: - master: single task scheduling fails after trying all offers in the queue - this patch: N tasks launched with the very last N offers in the queue + `(N x single_task_launch_latency)` Assuming that matching N tasks against M offers takes exactly the same time as 1 task against M offers (as they all share the same `TaskGroup`), the only measurable difference comes from the additional `N x single_task_launch_latency` overhead. Based on real cluster observations, the `single_task_launch_latency` is less than 1% of a single task scheduling attempt, which is << than the savings from avoided additional scheduling rounds. As far as jmh results go, the new approach (batching + multiple tasks per round) is only slightly more demanding (~8%). Both results though are MUCH higher than the real cluster perf, which just confirms we are not bound by CPU time here: Master: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 17126.183 ± 488.425 ops/s ``` This patch: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 15838.051 ± 187.890 ops/s ``` Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 6f1cbfbc4510a037cffc95fee54f62f463d2b534 src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 87b9e1928ab2d44668df1123f32ffdc4197c0c70 src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 664bc6cf964ede2473a4463e58bcdbcb65bc7413 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 5d319557057e27fd5fc6d3e553e9ca9139399c50 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java d390c07522d22e43d79ce4370985f3643ef021ca src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 207d38d1ddfd373892602218a98c1daaf4a1325f src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java ece476b918e6f2c128039e561eea23a94d8ed396 src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 209f9298a1d55207b9b41159f2ab366f92c1eb70 src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java ee5c6528af89cc62a35fdb314358c489556d8131 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 98048fabc00f233925b6cca015c2525980556e2b src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 88729626de5fa87b45472792c59cc0ff1ade3e93 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java a4e87d2216401f344dca64d69b945de7bcf8159a src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java b4d27f69ad5d4cce03da9f04424dc35d30e8af29 Diff: https://reviews.apache.org/r/51929/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51929: Scheduling multiple tasks per round.
> On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java, line 55 > > <https://reviews.apache.org/r/51929/diff/4/?file=1506607#file1506607line55> > > > > javadoc might be useful here to make it clear that the collection is a > > list of task ids to remove? (or maybe just rename param to `taskIds`?) Renamed the param. > On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line > > 127 > > <https://reviews.apache.org/r/51929/diff/4/?file=1506608#file1506608line127> > > > > I know we have the `@Positive` annotation on the arg, but for the sake > > of safety (it's possible to initialize these settings from any source) it's > > probably worth ensuring `maxTasksPerScheduler` is > 0 here? > > > > If we do that, it might also make sense to move the `checkArgument` on > > `firstScheduleDelay` from the constructor below into this constructor? Good point. Done and done. > On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line > > 143 > > <https://reviews.apache.org/r/51929/diff/4/?file=1506608#file1506608line143> > > > > If we're going to store settings as an instance variable, we should > > reference the other settings from it as well (and kill off those > > variables), alternately just store `maxTasksPerScheduler` rather than > > settings itself)? Agree, there are a few refactoring waves colliding here and it's definitely time to simplify/remove duplication. Done. > On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 172 > > <https://reviews.apache.org/r/51929/diff/4/?file=1506610#file1506610line172> > > > > If we're just assuming `taskId` is present, why wrap it in an optional? > > (Presumably on the initial pass through the loop we know `taskIds` was not > > empty, and on each subsequent pass we check `remainingTasks.hasNext` before > > re-assigning `taskId`). Am I missing something? Good catch. There was a legitimate use of the `Optional` in the original diff but it's no longer needed after we changed to return `Set`. - Maxim ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review150225 --- On Sept. 20, 2016, 10:02 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 20, 2016, 10:02 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster observations, the > `single_task_launch_latency` is less than 1% of a single task scheduling > attempt, which is << than the savings from avoided additional scheduling > rounds. > > As far as jmh results go, the new approach (batching + multiple tasks per > round) is only slightly more demanding (~8%). Both results though are MUCH &g
Re: Review Request 51929: Scheduling multiple tasks per round.
> On Sept. 22, 2016, 10:34 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java, > > line 114 > > <https://reviews.apache.org/r/51929/diff/4/?file=1506605#file1506605line114> > > > > Is there a way of doing this without making `AttributeAggregate` > > explicitly mutable? > > > > I don't like how we have to update this from the offer out of band > > instead of just refreshing from storage. > > > > Why can't we just have a `refresh` method which re-runs the original > > query after we consume the offer? > > > > I'm weary of bugs where we forget to call this method after scheduling > > with an offer. The whole purpose of having the `AttributeAggregate` is to avoid reconstructing the job state every time a view into attributes is required. The current approach saves an expensive (especially in the `DBTaskStore case` and a large job combo) fetch across task and attribute stores. The underlying data structure of the `AttributeAggregate` is just a Multiset of host attributes and their values. Throwing all that data away and reconstructing it again just to have a new host representation added is very wasteful, especially given that we already have all new attributes handy. As for not forgetting to call it, I don't see how reconstructing would be any better in that regard as we would still have to remember to reconstruct it out of band. The `TaskAssignerImplTest.testAssignPartialNoVetoes` covers `AttributeAggregate` update case by the way. > On Sept. 22, 2016, 10:34 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line > > 179 > > <https://reviews.apache.org/r/51929/diff/4/?file=1506608#file1506608line179> > > > > We don't check if this is null later, so why not just make it an empty > > `ImmutableSet` to start? I don't see any benefits in that. This is the classic assign-or-throw pattern. The `result.get()` returns a Set and we don't pass nulls for collections. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review150085 --- On Sept. 20, 2016, 10:02 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 20, 2016, 10:02 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster observations, the > `single_task_launch_latency` is less than 1% of a single task scheduling > attempt, which is << than the savings from avoided additional scheduling > rounds. > > As far as jmh results go, the new approach (batching + multiple tasks per > round) is only slightly more demanding (~8%). Both results though are MUCH > higher than the real cluster perf, which just confirms we are not bound by > CPU time here: > > Master: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark
Re: Review Request 52167: Prepare release notes for release, all update committers guide with details on this step and on reverting auto-generated commits for subsequent release candidates.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52167/#review150053 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 22, 2016, 6:06 p.m., Joshua Cohen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52167/ > --- > > (Updated Sept. 22, 2016, 6:06 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > Prepare release notes for release, all update committers guide with details > on this step and on reverting auto-generated commits for subsequent release > candidates. > > > Diffs > - > > RELEASE-NOTES.md 854bcbdf4e0dc40d0e18532642903e67232d2c9a > docs/development/committers-guide.md > ebf464bf887772862bf9d420a8f65720038275bc > > Diff: https://reviews.apache.org/r/52167/diff/ > > > Testing > --- > > > Thanks, > > Joshua Cohen > >
Re: Review Request 52141: Shutting down scheduler on unhandled BatchWorker error.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52141/ --- (Updated Sept. 22, 2016, 3:20 p.m.) Review request for Aurora, Stephan Erb and Zameer Manji. Changes --- Removing redundant shutdown hook. Bugs: AURORA-1779 https://issues.apache.org/jira/browse/AURORA-1779 Repository: aurora Description --- ##Problem The current `BatchWorker` error handling assumes graceful recovery if any of the individual batch items fail. This may not hold true if the storage is modified _before_ the error is thrown and the `LogStorage` transaction is not rolled back. Consider the following fragment from the `StateManagerImpl`: ``` storeProvider.getUnsafeTaskStore().saveTasks(scheduledTasks); for (IScheduledTask scheduledTask : scheduledTasks) { updateTaskAndExternalState( storeProvider.getUnsafeTaskStore(), Tasks.id(scheduledTask), Optional.of(scheduledTask), Optional.of(PENDING), Optional.absent()); } ``` If a task is saved but the subsequent `updateTaskAndExternalState()` throws AND `BatchWorker` catches and logs that exception, the native store transaction will be committed and as a result the storage will be left in a logically inconsistent state. ##Solution I can see at least 3 ways to solve this problem: 1. Fail hard and shutdown scheduler when any of the batch items throw; 2. Fail only the last batch (drop all its items) and let storage transaction roll back; 3. Implement `BatchWorker` transaction where items _before_ and _after_ the failed item would be retried when the storage transaction rolls back; The #3 is the most involved and would be very hard to get right (assuming batch items are idempotent, which may not be the case). The #2 may result in a very hard to troubleshoot scenario where _some_ items would be dropped on the floor and never completed. I suggest we go with #1 as the most straightforward and transparent approach. It's also the only one that ensures storage state consistency, which is the most vital invariant to preserve (as AURORA-1779 proves). The only downside of this approach is that scheduler will go down hard any time an unhandled error is thrown but arguably this is the easiest way to improve our codebase and certainly better than leaving the scheduler in a crippled state. Diffs (updated) - src/main/java/org/apache/aurora/scheduler/BatchWorker.java e05d4b48749b0691902a505bea1b4490fdd08f29 src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java a86dc82cafa7f5436d2b8d00c6db575ff8311eea Diff: https://reviews.apache.org/r/52141/diff/ Testing --- unit and e2e tests Thanks, Maxim Khutornenko
Re: Review Request 52141: Shutting down scheduler on unhandled BatchWorker error.
> On Sept. 22, 2016, 2:24 p.m., Joshua Cohen wrote: > > What's the behavior pre-BatchWorker in this case? Would we fail hard, or is > > there something inherent to the batch worker that makes this necessary? > > > > Also, this fixes the error handling that led to corruption in the event of > > a failure, but should we also determine the root cause that led to the > > failure to begin with (as part of another review no doubt)? The pre-BatchWorker behavior depended on where/how the error was thrown and could have resulted in a hard shutdown or a mysterious silent async failure killing one of the executor threads. With the BatchWorker, the scheduler will always fail on unhandled exception. This is needed to preserve the transaction boundaries and let the error rollback the changes. > ...but should we also determine the root cause that led to the failure to > begin with I thought it should have been clear from the RB description but then I realized I explained it in Slack and not here. We purposefully induce scheduler failovers many times during the e2e run to verify things like job updates still work. If the shutdown happens during the transaction, the DB gets closed underneath and the subsequent queries fail any way they like. This is exactly what happened in this case. A task is saved, driver exits, StateManager attempts to save task events but fails due to "DB closed" error. That error is caught by the BatchWorker and the native log transaction successfully completes before the scheduler fully terminates. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52141/#review14 ------- On Sept. 22, 2016, 5:38 a.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52141/ > --- > > (Updated Sept. 22, 2016, 5:38 a.m.) > > > Review request for Aurora, Stephan Erb and Zameer Manji. > > > Bugs: AURORA-1779 > https://issues.apache.org/jira/browse/AURORA-1779 > > > Repository: aurora > > > Description > --- > > ##Problem > > The current `BatchWorker` error handling assumes graceful recovery if any of > the individual batch items fail. This may not hold true if the storage is > modified _before_ the error is thrown and the `LogStorage` transaction is not > rolled back. Consider the following fragment from the `StateManagerImpl`: > ``` > storeProvider.getUnsafeTaskStore().saveTasks(scheduledTasks); > > for (IScheduledTask scheduledTask : scheduledTasks) { > updateTaskAndExternalState( > storeProvider.getUnsafeTaskStore(), > Tasks.id(scheduledTask), > Optional.of(scheduledTask), > Optional.of(PENDING), > Optional.absent()); > } > ``` > If a task is saved but the subsequent `updateTaskAndExternalState()` throws > AND `BatchWorker` catches and logs that exception, the native store > transaction will be committed and as a result the storage will be left in a > logically inconsistent state. > > ##Solution > > I can see at least 3 ways to solve this problem: > 1. Fail hard and shutdown scheduler when any of the batch items throw; > 2. Fail only the last batch (drop all its items) and let storage transaction > roll back; > 3. Implement `BatchWorker` transaction where items _before_ and _after_ the > failed item would be retried when the storage transaction rolls back; > > The #3 is the most involved and would be very hard to get right (assuming > batch items are idempotent, which may not be the case). The #2 may result in > a very hard to troubleshoot scenario where _some_ items would be dropped on > the floor and never completed. > > I suggest we go with #1 as the most straightforward and transparent approach. > It's also the only one that ensures storage state consistency, which is the > most vital invariant to preserve (as AURORA-1779 proves). The only downside > of this approach is that scheduler will go down hard any time an unhandled > error is thrown but arguably this is the easiest way to improve our codebase > and certainly better than leaving the scheduler in a crippled state. > > > Diffs > - > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java > e05d4b48749b0691902a505bea1b4490fdd08f29 > src/main/java/org/apache/aurora/scheduler/SchedulerModule.java > 2ec3967ddb1d470cf681de062a6400f647978185 > src/main/jav
Re: Review Request 52141: Shutting down scheduler on unhandled BatchWorker error.
> On Sept. 22, 2016, 2:25 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 152 > > <https://reviews.apache.org/r/52141/diff/1/?file=1507901#file1507901line152> > > > > Something is wrong here. > > > > The listener from `GuavaUtils.LifecycleShutdownListener` should have > > already been applied to this service in `SchedulerServicesModule`. We used > > `addSchedulerActiveServiceBinding` to add the BatchWorker services so I'm > > not sure why this helps at all. Ah, that's right, that should be enough. So, the fix is even simpler, just let the error bubble up and the existing shutdown hook will take care of the rest. Verified in vagrant that's exactly what happens. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52141/#review15 ------- On Sept. 22, 2016, 5:38 a.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52141/ > --- > > (Updated Sept. 22, 2016, 5:38 a.m.) > > > Review request for Aurora, Stephan Erb and Zameer Manji. > > > Bugs: AURORA-1779 > https://issues.apache.org/jira/browse/AURORA-1779 > > > Repository: aurora > > > Description > --- > > ##Problem > > The current `BatchWorker` error handling assumes graceful recovery if any of > the individual batch items fail. This may not hold true if the storage is > modified _before_ the error is thrown and the `LogStorage` transaction is not > rolled back. Consider the following fragment from the `StateManagerImpl`: > ``` > storeProvider.getUnsafeTaskStore().saveTasks(scheduledTasks); > > for (IScheduledTask scheduledTask : scheduledTasks) { > updateTaskAndExternalState( > storeProvider.getUnsafeTaskStore(), > Tasks.id(scheduledTask), > Optional.of(scheduledTask), > Optional.of(PENDING), > Optional.absent()); > } > ``` > If a task is saved but the subsequent `updateTaskAndExternalState()` throws > AND `BatchWorker` catches and logs that exception, the native store > transaction will be committed and as a result the storage will be left in a > logically inconsistent state. > > ##Solution > > I can see at least 3 ways to solve this problem: > 1. Fail hard and shutdown scheduler when any of the batch items throw; > 2. Fail only the last batch (drop all its items) and let storage transaction > roll back; > 3. Implement `BatchWorker` transaction where items _before_ and _after_ the > failed item would be retried when the storage transaction rolls back; > > The #3 is the most involved and would be very hard to get right (assuming > batch items are idempotent, which may not be the case). The #2 may result in > a very hard to troubleshoot scenario where _some_ items would be dropped on > the floor and never completed. > > I suggest we go with #1 as the most straightforward and transparent approach. > It's also the only one that ensures storage state consistency, which is the > most vital invariant to preserve (as AURORA-1779 proves). The only downside > of this approach is that scheduler will go down hard any time an unhandled > error is thrown but arguably this is the easiest way to improve our codebase > and certainly better than leaving the scheduler in a crippled state. > > > Diffs > - > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java > e05d4b48749b0691902a505bea1b4490fdd08f29 > src/main/java/org/apache/aurora/scheduler/SchedulerModule.java > 2ec3967ddb1d470cf681de062a6400f647978185 > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java > 7c8047a7235a937c29fe96517242923ff84a080c > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > d390c07522d22e43d79ce4370985f3643ef021ca > src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java > a86dc82cafa7f5436d2b8d00c6db575ff8311eea > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java > 8556253fc11f6027316871eb9dc66d8627a77fe6 > > Diff: https://reviews.apache.org/r/52141/diff/ > > > Testing > --- > > unit and e2e tests > > > Thanks, > > Maxim Khutornenko > >
Review Request 52141: Shutting down scheduler on unhandled BatchWorker error.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52141/ --- Review request for Aurora, Stephan Erb and Zameer Manji. Bugs: AURORA-1779 https://issues.apache.org/jira/browse/AURORA-1779 Repository: aurora Description --- ##Problem The current `BatchWorker` error handling assumes graceful recovery if any of the individual batch items fail. This may not hold true if the storage is modified _before_ the error is thrown and the `LogStorage` transaction is not rolled back. Consider the following fragment from the `StateManagerImpl`: ``` storeProvider.getUnsafeTaskStore().saveTasks(scheduledTasks); for (IScheduledTask scheduledTask : scheduledTasks) { updateTaskAndExternalState( storeProvider.getUnsafeTaskStore(), Tasks.id(scheduledTask), Optional.of(scheduledTask), Optional.of(PENDING), Optional.absent()); } ``` If a task is saved but the subsequent `updateTaskAndExternalState()` throws AND `BatchWorker` catches and logs that exception, the native store transaction will be committed and as a result the storage will be left in a logically inconsistent state. ##Solution I can see at least 3 ways to solve this problem: 1. Fail hard and shutdown scheduler when any of the batch items throw; 2. Fail only the last batch (drop all its items) and let storage transaction roll back; 3. Implement `BatchWorker` transaction where items _before_ and _after_ the failed item would be retried when the storage transaction rolls back; The #3 is the most involved and would be very hard to get right (assuming batch items are idempotent, which may not be the case). The #2 may result in a very hard to troubleshoot scenario where _some_ items would be dropped on the floor and never completed. I suggest we go with #1 as the most straightforward and transparent approach. It's also the only one that ensures storage state consistency, which is the most vital invariant to preserve (as AURORA-1779 proves). The only downside of this approach is that scheduler will go down hard any time an unhandled error is thrown but arguably this is the easiest way to improve our codebase and certainly better than leaving the scheduler in a crippled state. Diffs - src/main/java/org/apache/aurora/scheduler/BatchWorker.java e05d4b48749b0691902a505bea1b4490fdd08f29 src/main/java/org/apache/aurora/scheduler/SchedulerModule.java 2ec3967ddb1d470cf681de062a6400f647978185 src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 7c8047a7235a937c29fe96517242923ff84a080c src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java d390c07522d22e43d79ce4370985f3643ef021ca src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java a86dc82cafa7f5436d2b8d00c6db575ff8311eea src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 8556253fc11f6027316871eb9dc66d8627a77fe6 Diff: https://reviews.apache.org/r/52141/diff/ Testing --- unit and e2e tests Thanks, Maxim Khutornenko
Re: Review Request 52074: switching from launchTask to acceptOffers
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52074/#review149862 --- Ship it! src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java (line 124) <https://reviews.apache.org/r/52074/#comment217589> nit: add newline above src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java (line 372) <https://reviews.apache.org/r/52074/#comment217592> Can be inlined below. - Maxim Khutornenko On Sept. 20, 2016, 7:14 p.m., Dmitriy Shirchenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52074/ > --- > > (Updated Sept. 20, 2016, 7:14 p.m.) > > > Review request for Aurora, Maxim Khutornenko, Mehrdad Nurolahzade, and Zameer > Manji. > > > Bugs: AURORA-1776 > https://issues.apache.org/jira/browse/AURORA-1776 > > > Repository: aurora > > > Description > --- > > Dynamic reservation requires us to use acceptOffers so this is in preparation > of that work. acceptOffers is the call to dynamically reserve resources. > > > Diffs > - > > src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java > 1bfd8dbf3aec314aaeb89b756ea3a2049df8777a > src/main/java/org/apache/aurora/scheduler/mesos/Driver.java > 14481468926b82a9bad3ee78c85f176e8ba893e2 > src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java > 41b9aab55aa439b13eba3545703ec97b44690444 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > 925c025d21bf1d44e0c1d319f6653ecfa8899481 > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > a739bce1226d9435fa7d0b18e411064a4e78e49e > > Diff: https://reviews.apache.org/r/52074/diff/ > > > Testing > --- > > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Dmitriy Shirchenko > >
Re: Review Request 51876: Modify executor state transition logic to rely on health checks (if enabled)
> On Sept. 20, 2016, 12:28 a.m., Kai Huang wrote: > > src/main/python/apache/aurora/executor/common/status_checker.py, line 104 > > <https://reviews.apache.org/r/51876/diff/3/?file=1505371#file1505371line104> > > > > Use TaskState.Value('TASK_RUNNING') here instead of > > mesos_pb2.TASK_RUNNING, because this file also used TaskState in multiple > > places. Not sure I get this. What prevents you from using `mesos_pb2.TASK_RUNNING` here and keep `TaskState` in other places? - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51876/#review149581 --- On Sept. 20, 2016, 12:25 a.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51876/ > --- > > (Updated Sept. 20, 2016, 12:25 a.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-1225 > https://issues.apache.org/jira/browse/AURORA-1225 > > > Repository: aurora > > > Description > --- > > Modify executor state transition logic to rely on health checks (if enabled). > > [Summary] > Executor needs to start executing user content in STARTING and transition to > RUNNING when a successful required number of health checks is reached. > > This review contains a series of executor changes that implement the health > check driven updates. It gives more context of the design of this feature. > > [Background] > Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > [Description] > If health check is enabled on vCurrent executor, the health checker will send > a "TASK_RUNNING" message when a successful required number of health checks > is reached within the initial_interval_secs. On the other hand, a > "TASK_FAILED" message was sent if the health checker fails to reach the > required number of health checks within that period, or a maximum number of > failed health check limit is reached after the initital_interval_secs. > > If health check is disabled on the vCurrent executor, it will sends > "TASK_RUNNING" message to scheduler after the thermos runner was started. In > this scenario, the behavior of vCurrent executor will be the same as the > vPrev executor. > > [Change List] > The current change set includes: > 1. Removed the status memoization in ChainedStatusChecker. > 2. Modified the StatusManager to be edge triggered. > 3. Changed the Aurora Executor callback function. > 4. Modified the Health Checker and redefined the meaning > initial_interval_secs. > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/python/apache/aurora/executor/aurora_executor.py > ce5ef680f01831cd89fced8969ae3246c7f60cfd > src/main/python/apache/aurora/executor/common/health_checker.py > 5fc845eceac6f0c048d7489fdc4c672b0c609ea0 > src/main/python/apache/aurora/executor/common/status_checker.py > 795dae2d6b661fc528d952c2315196d94127961f > src/main/python/apache/aurora/executor/common/task_info.py > 4ef49e30eeb942886d02c1fb448055264f9aa874 > src/main/python/apache/aurora/executor/status_manager.py > 228a99a05f339e21cd7e769a42b9b2276e7bc3fc > src/test/python/apache/aurora/executor/common/test_health_checker.py > bb6ea69dd94298c5b8cf4d5f06d06eea7790d66e > src/test/python/apache/aurora/executor/common/test_status_checker.py > 5be1981c8c8e88258456adb21aa3ca7c0aa472a7 > src/test/python/apache/aurora/executor/common/test_task_info.py > 0d9cc5cb340d697a887d8e001f23c948f4fa70c7 > src/test/python/apache/aurora/executor/test_status_manager.py > ce4679ba1aa7b42cf0115c943d84663030182d23 > src/test/python/apache/aurora/executor/test_thermos_executor.py > 0bfe9e931f873c9f804f2ba4012e050e1f9fd24e > > Diff: https://reviews.apache.org/r/51876/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > > ./pants test.pytest src/test/python/apache/aurora/executor:: > > > Thanks, > > Kai Huang > >
Re: Review Request 51876: Modify executor state transition logic to rely on health checks (if enabled)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51876/#review149846 --- api/src/main/thrift/org/apache/aurora/gen/api.thrift (lines 1218 - 1220) <https://reviews.apache.org/r/51876/#comment217562> What's the motivation behind moving these constants into the thrift layer? FWICT, these are only used inside the `health_checker.py` and should stay there. src/main/python/apache/aurora/executor/aurora_executor.py (line 80) <https://reviews.apache.org/r/51876/#comment217565> This does not have to be a global field, does it? It looks like `_register()` only adds callbacks but nothing ever reads from it. src/main/python/apache/aurora/executor/aurora_executor.py (lines 184 - 185) <https://reviews.apache.org/r/51876/#comment217566> Are there any guarantees the status_manager will always deal with `ThermosTaskRunner.EXIT_STATE_MAP` values? The way I see it any unmapped state will end up not calling the `_shutdown()` at all. Perhaps a safer way could be having an explicit pair of `_running()` to be called only for RUNNING state and `_shutdown()` act as a default callback for anything else? Curious what others think here. src/main/python/apache/aurora/executor/common/health_checker.py (line 140) <https://reviews.apache.org/r/51876/#comment217570> leftover? src/main/python/apache/aurora/executor/common/health_checker.py (lines 153 - 155) <https://reviews.apache.org/r/51876/#comment217578> This should be moved into the `_maybe_update_failure_count()` to have all decision making logic in a single place. src/main/python/apache/aurora/executor/common/health_checker.py (line 225) <https://reviews.apache.org/r/51876/#comment217582> This should be `mesos_pb2.TASK_RUNNING`, right? src/main/python/apache/aurora/executor/common/health_checker.py (lines 270 - 274) <https://reviews.apache.org/r/51876/#comment217563> This logic needs to be refactored to reduce duplication and reuse what's now in `is_health_check_enabled` as much as possible. Ideally, we should have the only place we extract `health_checker` and the like. src/main/python/apache/aurora/executor/status_manager.py (line 68) <https://reviews.apache.org/r/51876/#comment217567> This is exactly the problem I alluded to above. What if there is no entry for a given `status` here? - Maxim Khutornenko On Sept. 20, 2016, 12:25 a.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51876/ > --- > > (Updated Sept. 20, 2016, 12:25 a.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-1225 > https://issues.apache.org/jira/browse/AURORA-1225 > > > Repository: aurora > > > Description > --- > > Modify executor state transition logic to rely on health checks (if enabled). > > [Summary] > Executor needs to start executing user content in STARTING and transition to > RUNNING when a successful required number of health checks is reached. > > This review contains a series of executor changes that implement the health > check driven updates. It gives more context of the design of this feature. > > [Background] > Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > [Description] > If health check is enabled on vCurrent executor, the health checker will send > a "TASK_RUNNING" message when a successful required number of health checks > is reached within the initial_interval_secs. On the other hand, a > "TASK_FAILED" message was sent if the health checker fails to reach the > required number of health checks within that period, or a maximum number of > failed health check limit is reached after the initital_interval_secs. > > If health check is disabled on the vCurrent executor, it will sends > "TASK_RUNNING" message to scheduler after the thermos runner was started. In > this scenario, the behavior of vCurrent executor will be the same as the > vPrev executor. > > [Change List] > The current change set includes: > 1. Removed the status memoization in ChainedStatusChecker. > 2. Modified the StatusManager to be edge triggered. > 3. Changed the Aurora Executor callback function. > 4. Modified the Health Checker and redefined the meaning > initial_interval_secs. > &g
Re: Review Request 51929: Scheduling multiple tasks per round.
> On Sept. 20, 2016, 6:49 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java, > > lines 93-96 > > <https://reviews.apache.org/r/51929/diff/3/?file=1500698#file1500698line93> > > > > Regarding your notes in the RB description: I don't see a problem if we > > set this to a slightly higher value such as `10` or `15`. > > > > It seems like we will maintain the basic taskgroup round robin > > scheduling fairness even with slightly larger batch sizes, so I am ok with > > bumping the value. The higher this count is the less fair do we treat lower instance count jobs in case the capacity of the cluster is constrained. It's hard to come up with an ideal default here but I feel 10-15 would be a bit too extreme for the general case. > On Sept. 20, 2016, 6:49 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, lines > > 174-175 > > <https://reviews.apache.org/r/51929/diff/3/?file=1500700#file1500700line174> > > > > If I understand things correctly, I believe this line could have a > > performance bug in it: > > > > `batchWorker.execute` acquires the global storage lock before calling > > the `taskScheduler`. For the latter, we use the following definition: > > > > ``` > > this.taskScheduler = (store, taskIds) -> { > > settings.rateLimiter.acquire(); > > return taskScheduler.schedule(store, taskIds); > > }; > > ``` > > > > In combination, we will be throttled by the `rateLimiter` while holding > > the storage lock. > > > > Instead, we should try to acquire the log within the `Runnable` in > > `startGroup` before calling `batchWorker.execute` so that the global lock > > is not hold longer than absolutely necessary. This is a valid concern. It's certainly more possible than before to hit our default 40 tasks per second limit now. Fixed. - Maxim ----------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149681 --- On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 16, 2016, 9:53 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster observations, the > `single_task_launch_latency` is less than 1% of a single task scheduling > attempt, which is << than the savings from avoided additional scheduling > rounds. > > As far as jmh results go, the new approach (batching + multiple tasks per > round) is only slightly more demanding (~8%). Both results though are MUCH > higher than the real cluster perf, which just confirms we are not bound by > CPU time here: > > Master: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt
Re: Review Request 51929: Scheduling multiple tasks per round.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/ --- (Updated Sept. 20, 2016, 10:02 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Stephan's comments. Repository: aurora Description --- This is phase 2 of scheduling perf improvement effort started in https://reviews.apache.org/r/51759/. We can now take multiple (configurable) number of task IDs from a given `TaskGroup` per scheduling. The idea is to go deeper through the offer queue and assign more than one task if possible. This approach delivers substantially better MTTA and still ensures fairness across multiple `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be set even higher if the majority of cluster jobs have large number of instances and/or update batch sizes. As far as a single round perf goes, we can consider the following 2 worst-case scenarios: - master: single task scheduling fails after trying all offers in the queue - this patch: N tasks launched with the very last N offers in the queue + `(N x single_task_launch_latency)` Assuming that matching N tasks against M offers takes exactly the same time as 1 task against M offers (as they all share the same `TaskGroup`), the only measurable difference comes from the additional `N x single_task_launch_latency` overhead. Based on real cluster observations, the `single_task_launch_latency` is less than 1% of a single task scheduling attempt, which is << than the savings from avoided additional scheduling rounds. As far as jmh results go, the new approach (batching + multiple tasks per round) is only slightly more demanding (~8%). Both results though are MUCH higher than the real cluster perf, which just confirms we are not bound by CPU time here: Master: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 17126.183 ± 488.425 ops/s ``` This patch: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 15838.051 ± 187.890 ops/s ``` Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 6f1cbfbc4510a037cffc95fee54f62f463d2b534 src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 87b9e1928ab2d44668df1123f32ffdc4197c0c70 src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 664bc6cf964ede2473a4463e58bcdbcb65bc7413 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 5d319557057e27fd5fc6d3e553e9ca9139399c50 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java d390c07522d22e43d79ce4370985f3643ef021ca src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 207d38d1ddfd373892602218a98c1daaf4a1325f src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java ece476b918e6f2c128039e561eea23a94d8ed396 src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 209f9298a1d55207b9b41159f2ab366f92c1eb70 src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java ee5c6528af89cc62a35fdb314358c489556d8131 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 98048fabc00f233925b6cca015c2525980556e2b src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 88729626de5fa87b45472792c59cc0ff1ade3e93 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java a4e87d2216401f344dca64d69b945de7bcf8159a src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java b4d27f69ad5d4cce03da9f04424dc35d30e8af29 Diff: https://reviews.apache.org/r/51929/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51929: Scheduling multiple tasks per round.
> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line > > 197 > > <https://reviews.apache.org/r/51929/diff/1/?file=1499323#file1499323line197> > > > > Side show: Isn't that `if` unnecessary here and we can adjust the > > penality in any case? We will remove the group if `hasMore()` returns > > false, so any penality should be fine. > > Maxim Khutornenko wrote: > Not sure I follow. This is the place that applies penalty accrued inside > the `startGroup()` or removes the group if it's empty. > > Stephan Erb wrote: > Let me print some more code to make clearer what I meant. > > This is the code where we compute penaltyMs depending on `group.hasMore` > ``` > scheduledTaskPenalties.accumulate(group.getPenaltyMs()); > group.remove(scheduled); > if (group.hasMore()) { > penaltyMs = firstScheduleDelay; > } > } > } > > group.setPenaltyMs(penaltyMs); > evaluateGroupLater(this, group); > ``` > > > Later on we then drop empty groups in `evaluateGroupLater`: > ``` > private synchronized void evaluateGroupLater(Runnable evaluate, > TaskGroup group) { > // Avoid check-then-act by holding the intrinsic lock. If not done > atomically, we could > // remove a group while a task is being added to it. > if (group.hasMore()) { > executor.execute(evaluate, Amount.of(group.getPenaltyMs(), > Time.MILLISECONDS)); > } else { > groups.remove(group.getKey()); > } > } > ``` > > What I tried to say is that we could unconditionally write `penaltyMs = > firstScheduleDelay;` without the `if (group.hasMore())` in the first snippet. > If `group.hasMore()` returns false we will remove the group anyway, so it > does not matter if we set a new penality or not. > > This is completely unreleated to the change in this RB so feel free to > ignore it. It was more about checking my own understanding of the code. I see what you meant now. The slight problem with the simplification you propose is that it _may_ result in a penalty where it would not happen today: `startGroup()` calculates the penalty even though there are no more groups left but the call to `evaluateGroupLater()` is delayed for some reason AND a new task is added into the group thus recharging it. The penalty would be applied the moment `evaluateGroupLater()` is finally reached. It's certainly an edge case but I'd prefer not changing the current behavior in the this patch. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149179 --- On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 16, 2016, 9:53 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster obse
Re: Review Request 52087: Fix host maintenance commands to properly initialize the api client.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52087/#review149690 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 20, 2016, 5:32 p.m., Joshua Cohen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52087/ > --- > > (Updated Sept. 20, 2016, 5:32 p.m.) > > > Review request for Aurora, Maxim Khutornenko and Zameer Manji. > > > Repository: aurora > > > Description > --- > > Fix host maintenance commands to properly initialize the api client. > > > Diffs > - > > src/main/python/apache/aurora/admin/admin.py > 9fc89a2842d4651ac10aa82118531c450f051712 > src/main/python/apache/aurora/admin/admin_util.py > 394deb57af9ad8832a02ceab15f33b3c1e5c902b > src/main/python/apache/aurora/admin/host_maintenance.py > 677f870c50d1e19d34de36a3c677bb1dcd255ba0 > src/main/python/apache/aurora/admin/maintenance.py > bf446515222a1246b7fe32acd67315f8145a1212 > src/test/python/apache/aurora/admin/test_admin.py > f720742c50f774b9f5a41ba57ef251d08d79e8d1 > src/test/python/apache/aurora/admin/test_admin_sla.py > 54b5a823903b82c2082353ade132eb7b63e518db > src/test/python/apache/aurora/admin/util.py > d0a915cd8edee6b36d671b41f43a3af4fe751ae7 > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > e36726e0c0154e51bba40704e13eebec5c0f0d65 > > Diff: https://reviews.apache.org/r/52087/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > ran e2e tests. > > > Thanks, > > Joshua Cohen > >
Re: Review Request 51980: Refactor of Webhook and no longer posting entire task state via webhook on scheduler restart
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51980/#review149674 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 19, 2016, 6:28 p.m., Dmitriy Shirchenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51980/ > --- > > (Updated Sept. 19, 2016, 6:28 p.m.) > > > Review request for Aurora, Maxim Khutornenko, Stephan Erb, and Zameer Manji. > > > Bugs: AURORA-1772 > https://issues.apache.org/jira/browse/AURORA-1772 > > > Repository: aurora > > > Description > --- > > This is a refactor with addition of HttpClient injected into Webhook class as > opposed to previous usage of lower level HtttpURLConnection objects. > Additionally due to peformance issues, it is unncessary to POST entire task > state to webhook endpoint on every scheduler restart so that is removed in > this patch. > > > Diffs > - > > build.gradle d5a3a7a3bdb8349de6fc01d4a6271b32d942e531 > src/main/java/org/apache/aurora/scheduler/events/Webhook.java > e54aa19d67278fcb5586f07c399f09062f845a18 > src/main/java/org/apache/aurora/scheduler/events/WebhookInfo.java > e4193f7518e66d210a6d5aae22a9f04e2d4984b3 > src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java > eaa70533a64740c74cebf15739b7142e2d3a4d11 > src/main/resources/org/apache/aurora/scheduler/webhook.json > 00787985510d7d415b8074bef06d28b635c78b09 > src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > 488eefd14c3e67a41a75c809397c8d19f83cc08a > > Diff: https://reviews.apache.org/r/51980/diff/ > > > Testing > --- > > Part of reason for refactor is to allow easier testing so cleaner unit tests > were added with more code coverage. > > > Thanks, > > Dmitriy Shirchenko > >
Re: Review Request 51993: Added the 'reason' to the /pendingTasks endpoint
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51993/#review149487 --- src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (line 138) <https://reviews.apache.org/r/51993/#comment217136> I don't think this is the right approach. `TaskGroups` has no business in accessing the `NearestFit` functionality. Please, consider extracting [this](https://github.com/apache/aurora/blob/795a2728c623c35bd509d582c24684a6921c74ad/src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java#L194-L208) and making it a public method of the `NearestFit` itself (e.g.: `getPendingReason(Query.Builder query)`. You can query that method and marry the results right within the `PendingTasks`. - Maxim Khutornenko On Sept. 17, 2016, 9:36 p.m., Pradyumna Kaushik wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51993/ > --- > > (Updated Sept. 17, 2016, 9:36 p.m.) > > > Review request for Aurora. > > > Repository: aurora > > > Description > --- > > Added the 'reason' to the /pendingTasks endpoint > > > Diffs > - > > src/main/java/org/apache/aurora/scheduler/http/PendingTasks.java > c80e0c8adf80e12082a6952ae79b7d9cc960c5b6 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java > 5d319557057e27fd5fc6d3e553e9ca9139399c50 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > d390c07522d22e43d79ce4370985f3643ef021ca > src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java > 88729626de5fa87b45472792c59cc0ff1ade3e93 > > Diff: https://reviews.apache.org/r/51993/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Pradyumna Kaushik > >
Re: Review Request 51929: Scheduling multiple tasks per round.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149290 --- @ReviewBot retry - Maxim Khutornenko On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 16, 2016, 9:53 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster observations, the > `single_task_launch_latency` is less than 1% of a single task scheduling > attempt, which is << than the savings from avoided additional scheduling > rounds. > > As far as jmh results go, the new approach (batching + multiple tasks per > round) is only slightly more demanding (~8%). Both results though are MUCH > higher than the real cluster perf, which just confirms we are not bound by > CPU time here: > > Master: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt 10 17126.183 ± 488.425 ops/s > ``` > > This patch: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt 10 15838.051 ± 187.890 ops/s > ``` > > > Diffs > - > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 6f1cbfbc4510a037cffc95fee54f62f463d2b534 > src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java > 87b9e1928ab2d44668df1123f32ffdc4197c0c70 > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java > 664bc6cf964ede2473a4463e58bcdbcb65bc7413 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java > 5d319557057e27fd5fc6d3e553e9ca9139399c50 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > d390c07522d22e43d79ce4370985f3643ef021ca > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java > 207d38d1ddfd373892602218a98c1daaf4a1325f > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java > 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 > > src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java > ece476b918e6f2c128039e561eea23a94d8ed396 > > src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java > 209f9298a1d55207b9b41159f2ab366f92c1eb70 > > src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java > 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 > src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java > c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 > > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java > ee5c6528af89cc62a35fdb314358c489556d8131 > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java > 98048fabc00f233925b6cca015c2525980556e2b > > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java > 2c3e5f32c774
Re: Review Request 51929: Scheduling multiple tasks per round.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/ --- (Updated Sept. 16, 2016, 9:53 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Repository: aurora Description (updated) --- This is phase 2 of scheduling perf improvement effort started in https://reviews.apache.org/r/51759/. We can now take multiple (configurable) number of task IDs from a given `TaskGroup` per scheduling. The idea is to go deeper through the offer queue and assign more than one task if possible. This approach delivers substantially better MTTA and still ensures fairness across multiple `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be set even higher if the majority of cluster jobs have large number of instances and/or update batch sizes. As far as a single round perf goes, we can consider the following 2 worst-case scenarios: - master: single task scheduling fails after trying all offers in the queue - this patch: N tasks launched with the very last N offers in the queue + `(N x single_task_launch_latency)` Assuming that matching N tasks against M offers takes exactly the same time as 1 task against M offers (as they all share the same `TaskGroup`), the only measurable difference comes from the additional `N x single_task_launch_latency` overhead. Based on real cluster observations, the `single_task_launch_latency` is less than 1% of a single task scheduling attempt, which is << than the savings from avoided additional scheduling rounds. As far as jmh results go, the new approach (batching + multiple tasks per round) is only slightly more demanding (~8%). Both results though are MUCH higher than the real cluster perf, which just confirms we are not bound by CPU time here: Master: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 17126.183 ± 488.425 ops/s ``` This patch: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 15838.051 ± 187.890 ops/s ``` Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 6f1cbfbc4510a037cffc95fee54f62f463d2b534 src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 87b9e1928ab2d44668df1123f32ffdc4197c0c70 src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 664bc6cf964ede2473a4463e58bcdbcb65bc7413 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 5d319557057e27fd5fc6d3e553e9ca9139399c50 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java d390c07522d22e43d79ce4370985f3643ef021ca src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 207d38d1ddfd373892602218a98c1daaf4a1325f src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java ece476b918e6f2c128039e561eea23a94d8ed396 src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 209f9298a1d55207b9b41159f2ab366f92c1eb70 src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java ee5c6528af89cc62a35fdb314358c489556d8131 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 98048fabc00f233925b6cca015c2525980556e2b src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 88729626de5fa87b45472792c59cc0ff1ade3e93 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java a4e87d2216401f344dca64d69b945de7bcf8159a src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java b4d27f69ad5d4cce03da9f04424dc35d30e8af29 Diff: https://reviews.apache.org/r/51929/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51929: Scheduling multiple tasks per round.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/ --- (Updated Sept. 16, 2016, 9:53 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebased over master. Repository: aurora Description --- This is phase 2 of scheduling perf improvement effort started in https://reviews.apache.org/r/51759/. We can now take multiple (configurable) number of task IDs from a given `TaskGroup` per scheduling. The idea is to go deeper through the offer queue and assign more than one task if possible. This approach delivers substantially better MTTA and still ensures fairness across multiple `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be set even higher if the majority of cluster jobs have large number of instances and/or update batch sizes. As far as a single round perf goes, we can consider the following 2 worst-case scenarios: - master: single task scheduling fails after trying all offers in the queue - this patch: N tasks launched with the very last N offers in the queue + `(N x single_task_launch_latency)` Assuming that matching N tasks against M offers takes exactly the same time as 1 task against M offers (as they all share the same `TaskGroup`), the only measurable difference comes from the additional `N x single_task_launch_latency` overhead. Based on real cluster observations, the `single_task_launch_latency` is less than 1% of a single task scheduling attempt, which is << than the savings from avoided additional scheduling rounds. As far as jmh results go, the new approach (batching + multiple tasks per round) is only slightly more demanding (~8%). Both results though are MUCH higher than the real cluster perf, which just confirms we are not bound by CPU time here: Master: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 17126.183 ± 488.425 ops/s ``` This patch: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 15838.051 ± 187.890 ops/s ``` NOTE: this will not apply cleanly as it branched off of https://reviews.apache.org/r/51765, which itself depends on https://reviews.apache.org/r/51759/. Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 6f1cbfbc4510a037cffc95fee54f62f463d2b534 src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 87b9e1928ab2d44668df1123f32ffdc4197c0c70 src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 664bc6cf964ede2473a4463e58bcdbcb65bc7413 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 5d319557057e27fd5fc6d3e553e9ca9139399c50 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java d390c07522d22e43d79ce4370985f3643ef021ca src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 207d38d1ddfd373892602218a98c1daaf4a1325f src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java ece476b918e6f2c128039e561eea23a94d8ed396 src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 209f9298a1d55207b9b41159f2ab366f92c1eb70 src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java ee5c6528af89cc62a35fdb314358c489556d8131 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 98048fabc00f233925b6cca015c2525980556e2b src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 88729626de5fa87b45472792c59cc0ff1ade3e93 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java a4e87d2216401f344dca64d69b945de7bcf8159a src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java b4d27f69ad5d4cce03da9f04424dc35d30e8af29 Diff: https://reviews.apache.org/r/51929/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/ --- (Updated Sept. 16, 2016, 9:04 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebased. Repository: aurora Description --- This is the second part of the `BatchWorker` conversion work that moves cron jobs to use non-blocking kill followups and reduces the number of trigger threads. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem The current implementation of the cron scheduling relies on a large number of threads (`cron_scheduler_num_threads`=100) to support cron triggering and killing existing tasks according to `KILL_EXISTING` collision policy. This creates large spikes of activities at synchronized intervals as users tend to schedule their cron runs around similar schedules. Moreover, the current implementation re-acquires write locks multiple times to deliver on `KILL_EXISTING` policy. #Remediation Trigger level batching is still done in a blocking way but multiple cron triggers may be bundled together to share the same write transaction. Any followups, however, are performed in a non-blocking way by relying on a `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In order to still ensure non-concurrent execution of a given job key trigger, a token (job key) is saved within the trigger itself. A concurrent trigger will bail if a kill followup is still in progress (token is set AND no entry in `killFollowups` set exists yet). #Results The above approach allowed reducing the number of cron threads to 10 and likely can be reduced even further. See https://reviews.apache.org/r/51759 for the lock contention results. Diffs (updated) - commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java bc30990d57f444f7d64805ed85c363f1302736d0 src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java c07551e94f9221b5b21c5dc9715e82caa290c2e8 src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 155d702d68367b247dd066f773c662407f0e3b5b src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5c64ff2994e200b3453603ac5470e8e152cebc55 src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f Diff: https://reviews.apache.org/r/51763/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51973/#review149271 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 16, 2016, 8:49 p.m., Renan DelValle wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51973/ > --- > > (Updated Sept. 16, 2016, 8:49 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > Change in the thrift API to make thee cronSchedule string in JobConfiguration > an optional. > > > Diffs > - > > RELEASE-NOTES.md 411872b0244698b4ca74228bc21da608dcb98ae0 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > a045a21585fe40e63e2094fa103f205e7883eb35 > > Diff: https://reviews.apache.org/r/51973/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Renan DelValle > >
Re: Review Request 51973: Fix for AURORA-1739, enables golang thrift bindings to create jobs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51973/#review149266 --- Not that it changes much but since it's technically a schema change a release note would be great. - Maxim Khutornenko On Sept. 16, 2016, 8:36 p.m., Renan DelValle wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51973/ > --- > > (Updated Sept. 16, 2016, 8:36 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > Change in the thrift API to make thee cronSchedule string in JobConfiguration > an optional. > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > a045a21585fe40e63e2094fa103f205e7883eb35 > > Diff: https://reviews.apache.org/r/51973/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Renan DelValle > >
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote: > > config/findbugs/excludeFilter.xml, line 123 > > <https://reviews.apache.org/r/51763/diff/4/?file=1498752#file1498752line123> > > > > If I'm reading the code correctly we need this because we have a > > `CompletableFuture` and to give it a value we use `null`. > > > > Have you considered using `new Object()` instead so we don't have to > > add this exception to our rules? > > Maxim Khutornenko wrote: > I am not comfortable substituting a `Void` with a random `Object` just to > avoid analyzer error. This opens up a potential for much more confusing cases. > > John Sirois wrote: > Generally a `SideEffect` type with 1 instance would be handy for these > things. Something to consider outside this review. > > IE: > ```java > package org.apache.aurora.lang; > > public enum SideEffect { > COMPLETE; > } > ``` > > I'm not sure if others would find `CompletableFuture` and the > corresponding `return SideEffect.COMPLETE` an improvement in these sorts of > situations. It is explicit. I don't think enum is useful here as it does not deliver any additional signal. How about using a `NoResult` interface instance instead? Updated the parent RB. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/#review149093 --- On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51763/ > --- > > (Updated Sept. 14, 2016, 11:12 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the second part of the `BatchWorker` conversion work that moves cron > jobs to use non-blocking kill followups and reduces the number of trigger > threads. See https://reviews.apache.org/r/51759 for more background on the > `BatchWorker`. > > #Problem > The current implementation of the cron scheduling relies on a large number of > threads (`cron_scheduler_num_threads`=100) to support cron triggering and > killing existing tasks according to `KILL_EXISTING` collision policy. This > creates large spikes of activities at synchronized intervals as users tend to > schedule their cron runs around similar schedules. Moreover, the current > implementation re-acquires write locks multiple times to deliver on > `KILL_EXISTING` policy. > > #Remediation > Trigger level batching is still done in a blocking way but multiple cron > triggers may be bundled together to share the same write transaction. Any > followups, however, are performed in a non-blocking way by relying on a > `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. > In order to still ensure non-concurrent execution of a given job key trigger, > a token (job key) is saved within the trigger itself. A concurrent trigger > will bail if a kill followup is still in progress (token is set AND no entry > in `killFollowups` set exists yet). > > #Results > The above approach allowed reducing the number of cron threads to 10 and > likely can be reduced even further. See https://reviews.apache.org/r/51759 > for the lock contention results. > > > Diffs > - > > commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java > 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 > commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java > bc30990d57f444f7d64805ed85c363f1302736d0 > config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java > c07551e94f9221b5b21c5dc9715e82caa290c2e8 > src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java > 155d702d68367b247dd066f773c662407f0e3b5b > > src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java > 5c64ff2994e200b3453603ac5470e8e152cebc55 > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java > 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f > > Diff: https://reviews.apache.org/r/51763/diff/ > > > Testing > --- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
g/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 15, 2016, 10:58 a.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 166-167 > > <https://reviews.apache.org/r/51759/diff/7/?file=1498729#file1498729line166> > > > > super nitpicky: mind swapping the order of these args to keep inline > > with `execute(Work work)`? This ordering is deliberate to improve reading at the call site. The verbose lambda is coming last in the `executeWithReplay` subjectively making it easier to read (https://reviews.apache.org/r/51763/). If you disagree I can reorder though. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review149044 --- On Sept. 16, 2016, 8:03 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 16, 2016, 8:03 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on particular workload types: task even processing, > cron scheduling and task scheduling. Every instance can be tuned > independently (max batch size) and provides specialized metrics helping to > monitor each workload type perf. > > #Results > The proposed approach has been heavily tested in production and delivered the > best results. The lock contention latencies got down between 2x and 5x > depending on the cluster load. A number of other approaches tried but > discarded as not performing well or even performing much worse than the > current master: > - Clock-driven batch execution - every batch is dispatched on a time schedule > - Max batch with a deadline - a batch is dispatched when max size is reached > OR a timeout expires > - Various combinations of the above - some `BatchWorkers` are using > clock-driven execution while others are using max batch with a deadline > - Completely non-blocking (event-based) completion notification - all call > sites are notified of item completion via a `BatchWorkCompleted` event > > Happy to provide more details on the above if interested. > > #Upcoming > The introduction of the `BatchWorker` by itself was not enough to > substantially improve the MTTA. It, however, paves the way for the next phase > of scheduling perf improvement - taking more than 1 task from a given > `TaskGroup` in a single scheduling r
Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
> On Sept. 16, 2016, 6:28 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java, > > line 91 > > <https://reviews.apache.org/r/51765/diff/5/?file=1498765#file1498765line91> > > > > Curious, arbitrary or derrived? Arbitrary derived :) There is no perfect default here though. Just trying to be conservative especially given the upcoming https://reviews.apache.org/r/51929/. After all is done, we will have 15x improvement in scheduling rate derived from a single `write()` transaction. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/#review149229 --- On Sept. 14, 2016, 11:18 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51765/ > --- > > (Updated Sept. 14, 2016, 11:18 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the final part of the `BatchWorker` conversion work that converts > `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background > on the `BatchWorker`. > > #Problem > See https://reviews.apache.org/r/51759 > > #Remediation > Task scheduling is one of the most dominant users of the write lock. It's > also one of the heaviest and the most latency-sensitive. As such, the default > max batch size is chosen conservatively low (3) and batch items are executed > in a blocking way. > > BTW, attempting to make task scheduling non-blocking resulted in a much worse > scheduling performance. The way our `DBTaskStore` is wired, all async > activities, including `EventBus` are bound to use a single async `Executor`, > which is currently limited at 8 threads [1]. Relying on the same `EventBus` > to deliver scheduling completion events resulted in slower scheduling perf as > those events were backed up behind all other activities, including tasks > status events, reconciliation and etc. Increasing the executor thread pool > size to a larger number on the other side, also increased the lock contention > defeating the whole purpose of this work. > > #Results > See https://reviews.apache.org/r/51759 for the lock contention results. > > https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 > > > Diffs > - > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 9d0d40b82653fb923bed16d06546288a1576c21d > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java > 11e8033438ad0808e446e41bb26b3fa4c04136c7 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > c044ebe6f72183a67462bbd8e5be983eb592c3e9 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java > d266f6a25ae2360db2977c43768a19b1f1efe8ff > src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java > c2ceb4e7685a9301f8014a9183e02fbad65bca26 > src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java > 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java > 72562e6bd9a9860c834e6a9faa094c28600a8fed > > Diff: https://reviews.apache.org/r/51765/diff/ > > > Testing > --- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
> On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote: > > config/findbugs/excludeFilter.xml, line 123 > > <https://reviews.apache.org/r/51763/diff/4/?file=1498752#file1498752line123> > > > > If I'm reading the code correctly we need this because we have a > > `CompletableFuture` and to give it a value we use `null`. > > > > Have you considered using `new Object()` instead so we don't have to > > add this exception to our rules? I am not comfortable substituting a `Void` with a random `Object` just to avoid analyzer error. This opens up a potential for much more confusing cases. > On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, > > line 78 > > <https://reviews.apache.org/r/51763/diff/4/?file=1498753#file1498753line78> > > > > Just to be clear, this annotation is needed to ensure the data in > > `JobDataMap` is persisted properly? Correct. > On Sept. 15, 2016, 6:25 p.m., Zameer Manji wrote: > > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java, line 100 > > <https://reviews.apache.org/r/51763/diff/4/?file=1498756#file1498756line100> > > > > For integration tests I was under the impression we used > > `org.apache.aurora.scheduler.testing.FakeStatsProvider` The `FakeStatsProvider` is only useful when we want to get a reading on updated metrics. Since everything else that can post stats is mocked, there is no reason to use a real object here. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/#review149093 --- On Sept. 14, 2016, 11:12 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51763/ > --- > > (Updated Sept. 14, 2016, 11:12 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the second part of the `BatchWorker` conversion work that moves cron > jobs to use non-blocking kill followups and reduces the number of trigger > threads. See https://reviews.apache.org/r/51759 for more background on the > `BatchWorker`. > > #Problem > The current implementation of the cron scheduling relies on a large number of > threads (`cron_scheduler_num_threads`=100) to support cron triggering and > killing existing tasks according to `KILL_EXISTING` collision policy. This > creates large spikes of activities at synchronized intervals as users tend to > schedule their cron runs around similar schedules. Moreover, the current > implementation re-acquires write locks multiple times to deliver on > `KILL_EXISTING` policy. > > #Remediation > Trigger level batching is still done in a blocking way but multiple cron > triggers may be bundled together to share the same write transaction. Any > followups, however, are performed in a non-blocking way by relying on a > `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. > In order to still ensure non-concurrent execution of a given job key trigger, > a token (job key) is saved within the trigger itself. A concurrent trigger > will bail if a kill followup is still in progress (token is set AND no entry > in `killFollowups` set exists yet). > > #Results > The above approach allowed reducing the number of cron threads to 10 and > likely can be reduced even further. See https://reviews.apache.org/r/51759 > for the lock contention results. > > > Diffs > - > > commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java > 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 > commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java > bc30990d57f444f7d64805ed85c363f1302736d0 > config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java > c07551e94f9221b5b21c5dc9715e82caa290c2e8 > src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java > 155d702d68367b247dd066f773c662407f0e3b5b > > src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java > 5c64ff2994e200b3453603ac5470e8e152cebc55 > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java > 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f > > Diff: https://reviews.apache.org/r/51763/diff/ > > > Testing > --- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
chWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 16, 2016, 6:18 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 213 > > <https://reviews.apache.org/r/51759/diff/7/?file=1498729#file1498729line213> > > > > I think this is a misuse of `CompletalbleFuture` I discovered this as I > > was reviewing https://reviews.apache.org/r/51765/. > > > > I think here we need to use > > https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#completeExceptionally-java.lang.Throwable- > > here and set the exception into the future. > > > > Then the exception will propagate to the caller (like > > `TaskGroupBatchWorker`). > > > > Then the user of the batch worker can handle the exception. Good catch. This is a leftover after refactoring `AwaitableResult` into `CompletableFuture` and it makes perfect sense to channel that error upstream. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review149227 ------- On Sept. 14, 2016, 10:41 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 14, 2016, 10:41 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on particular workload types: task even processing, > cron scheduling and task scheduling. Every instance can be tuned > independently (max batch size) and provides specialized metrics helping to > monitor each workload type perf. > > #Results > The proposed approach has been heavily tested in production and delivered the > best results. The lock contention latencies got down between 2x and 5x > depending on the cluster load. A number of other approaches tried but > discarded as not performing well or even performing much worse than the > current master: > - Clock-driven batch execution - every batch is dispatched on a time schedule > - Max batch with a deadline - a batch is dispatched when max size is reached > OR a timeout expires > - Various combinations of the above - some `BatchWorkers` are using > clock-driven execution while others are using max batch with a deadline > - Completely non-blocking (event-based) completion notification - all call > sites are notified of item completion via a `Bat
Re: Review Request 51929: Scheduling multiple tasks per round.
/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51929: Scheduling multiple tasks per round.
> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line > > 197 > > <https://reviews.apache.org/r/51929/diff/1/?file=1499323#file1499323line197> > > > > Side show: Isn't that `if` unnecessary here and we can adjust the > > penality in any case? We will remove the group if `hasMore()` returns > > false, so any penality should be fine. Not sure I follow. This is the place that applies penalty accrued inside the `startGroup()` or removes the group if it's empty. > On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, > > lines 208-213 > > <https://reviews.apache.org/r/51929/diff/1/?file=1499324#file1499324line208> > > > > I would have expected that we only request preemption if we failed to > > schedule all tasks in the current group. If I remember correctly, > > preemption slot search only happens on a per-group basis anyway. > > > > You have probably thought about this, so I would like to understand > > your reasoning. We still want to attempt a preemption if _some_ but not _all_ tasks are scheduled within a given round, right? Otherwise, preemption becomes an all-or-nothing feature and we have to wait for another scheduling cycle to request a reservation. > On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 174 > > <https://reviews.apache.org/r/51929/diff/1/?file=1499325#file1499325line174> > > > > Shouldn't that only happen if `launchTask` has succeeded? I've debated that as well and decided it's more logical to finish accessing offer details before it's being launched with and removed from the `OfferManager`. BTW, I just realized I was missing a `break` statement to bail out from the scheduling round if a `LaunchException` is thrown. While theoretically we _could_ continue matching even after the `LaunchException`, it's hard to reason about the state of the storage and the last assigned item and as such it's safer to terminate than continue. Fixed and added a test case. This should now be resolved. Thanks for asking :) > On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, lines > > 214-220 > > <https://reviews.apache.org/r/51929/diff/1/?file=1499325#file1499325line214> > > > > There is additional complexity here and in other places because > > `schedule`, `scheduleTask` and `maybeAssign` return a `Map > Boolean>` encoding if scheduling was successful for all given tasks. > > > > I'd propose to change the signature so that all methods mentioned above > > return `Set` containing only the successfully launched tasks. > > > > If a caller is really interested in tasks failed to schedule, he can > > compute the set difference between passed and returned task ids. I was fighting with myself over this initially and decided to keep the original contract. Now that the aproach fully shaped up, I can see how it can be simplified by not returning the exhaustive map of things. Refactored to return only what's needed. Thanks for bringing this up! - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149179 --- On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 16, 2016, 12:51 a.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or
Re: Review Request 51874: Change framework_name default value from 'TwitterScheduler' to 'Aurora'
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51874/#review149216 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 15, 2016, 7:02 p.m., Santhosh Kumar Shanmugham wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51874/ > --- > > (Updated Sept. 15, 2016, 7:02 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1688 > https://issues.apache.org/jira/browse/AURORA-1688 > > > Repository: aurora > > > Description > --- > > Change framework_name default value from 'TwitterScheduler' to 'Aurora' > > > Diffs > - > > RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 > > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java > 8a386bd208956eb0c8c2f48874b0c6fb3af58872 > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > 97677f24a50963178a123b420d7ac136e4fde3fe > > Diff: https://reviews.apache.org/r/51874/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > Testing to make sure backward compatibility: > > # HEAD of master: > > # Case 1: Rolling forward does not impact running tasks: > Renaming framework from 'TwitterScheduler' to 'Aurora': > > The framework re-registers after restart (treated by master as failover) and > gets the same framework-id. Running task remain unaffected. > > Master log: > I0914 16:48:28.408182 9815 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to > failover > I0914 16:48:28.408226 9815 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851- > E0914 16:48:28.408617 9819 process.cpp:2105] Failed to shutdown socket with > fd 28: Transport endpoint is not connected > I0914 16:48:43.722126 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'Aurora' at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > I0914 16:48:43.722190 9813 master.cpp:2500] Subscribing framework Aurora > with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, > GPU_RESOURCES ] > I0914 16:48:43.75 9813 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851- > I0914 16:48:43.722256 9813 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over > I0914 16:48:43.722429 9813 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851- > I0914 16:48:43.722595 9813 master.cpp:5709] Sending 1 offers to framework > 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > > Scheduler log: > I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID > value: "071c44a1-b4d4-4339-a727-03a79f725851-" > , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae" > ip: 119646400 > port: 5050 > pid: "master@192.168.33.7:5050" > hostname: "aurora.local" > version: "1.0.0" > address { > hostname: "aurora.local" > ip: "192.168.33.7" > port: 5050 > } > > # Case 2: Rolling backward does not impact running tasks: > Rolling back framework name from 'Aurora' to 'TwitterScheduler': > > The framework re-registers after restart (treated by master as failover) and > gets the same framework-id. Running task remain unaffected. > > Master log: > I0914 16:51:33.203495 9812 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to > failover > I0914 16:51:33.203526 9812 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851- > I0914 16:51:49.614074 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'TwitterScheduler' at > scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083 > I0914 16:51:49.614215 9813 master.cpp:2500] Subscribing framework > TwitterScheduler with checkpointing enabled and capabi
Re: Review Request 51929: Scheduling multiple tasks per round.
> On Sept. 16, 2016, 1:20 a.m., Aurora ReviewBot wrote: > > Master (783baae) is red with this patch. > > ./build-support/jenkins/build.sh > > > > [1m# Create file stdout for capturing output. > > We can't use StringIO mock[0m > > [1m# because TestProcess is running fork.[0m > > [1mwith open(os.path.join(td, 'sys_stdout'), > > 'w+') as stdout:[0m > > [1m with open(os.path.join(td, > > 'sys_stderr'), 'w+') as stderr:[0m > > [1mwith mutable_sys():[0m > > [1m sys.stdout, sys.stderr = stdout, > > stderr[0m > > [1m[0m > > [1m p = TestProcess('process', 'echo > > hello world; echo >&2 hello stderr', 0,[0m > > [1m taskpath, sandbox, > > logger_destination=LoggerDestination.BOTH)[0m > > [1m p.start()[0m > > [1m rc = > > wait_for_rc(taskpath.getpath('process_checkpoint'))[0m > > [1m[0m > > [1m assert rc == 0[0m > > [1m # Check log files were created in > > std path with correct content[0m > > [1m> assert_log_content(taskpath, > > 'stdout', 'hello world\n')[0m > > > > > > src/test/python/apache/thermos/core/test_process.py:487: > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > > taskpath = > at 0x7fdd3cd73b10> > > log_name = 'stdout' > > expected_content = 'hello world\n' > > > > [1mdef assert_log_content(taskpath, log_name, > > expected_content):[0m > > [1m log = > > taskpath.with_filename(log_name).getpath('process_logdir')[0m > > [1m assert os.path.exists(log)[0m > > [1m with open(log, 'r') as fp:[0m > > [1m> assert fp.read() == expected_content[0m > > [1m[31mE assert '' == 'hello world\n'[0m > > [1m[31mE + hello world[0m > > > > > > src/test/python/apache/thermos/core/test_process.py:313: AssertionError > > generated xml file: > > /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml > > > > [1m[31m 1 failed, 710 passed, 6 skipped, 1 warnings > > in 226.09 seconds [0m > > > > FAILURE > > > > > > 01:19:57 04:18 [complete][31m > >FAILURE[0m > > > > > > I will refresh this build result if you post a review containing > > "@ReviewBot retry" @ReviewBot retry - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149162 --- On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > --- > > (Updated Sept. 16, 2016, 12:51 a.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_pe
Review Request 51929: Scheduling multiple tasks per round.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/ --- Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Repository: aurora Description --- This is phase 2 of scheduling perf improvement effort started in https://reviews.apache.org/r/51759/. We can now take multiple (configurable) number of task IDs from a given `TaskGroup` per scheduling. The idea is to go deeper through the offer queue and assign more than one task if possible. This approach delivers substantially better MTTA and still ensures fairness across multiple `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be set even higher if the majority of cluster jobs have large number of instances and/or update batch sizes. As far as a single round perf goes, we can consider the following 2 worst-case scenarios: - master: single task scheduling fails after trying all offers in the queue - this patch: N tasks launched with the very last N offers in the queue + `(N x single_task_launch_latency)` Assuming that matching N tasks against M offers takes exactly the same time as 1 task against M offers (as they all share the same `TaskGroup`), the only measurable difference comes from the additional `N x single_task_launch_latency` overhead. Based on real cluster observations, the `single_task_launch_latency` is less than 1% of a single task scheduling attempt, which is << than the savings from avoided additional scheduling rounds. As far as jmh results go, the new approach (batching + multiple tasks per round) is only slightly more demanding (~8%). Both results though are MUCH higher than the real cluster perf, which just confirms we are not bound by CPU time here: Master: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 17126.183 ± 488.425 ops/s ``` This patch: ``` Benchmark Mode Cnt Score Error Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark thrpt 10 15838.051 ± 187.890 ops/s ``` NOTE: this will not apply cleanly as it branched off of https://reviews.apache.org/r/51765, which itself depends on https://reviews.apache.org/r/51759/. Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 87b9e1928ab2d44668df1123f32ffdc4197c0c70 src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 5d319557057e27fd5fc6d3e553e9ca9139399c50 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9 src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java ece476b918e6f2c128039e561eea23a94d8ed396 src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 209f9298a1d55207b9b41159f2ab366f92c1eb70 src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java ee5c6528af89cc62a35fdb314358c489556d8131 src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 98048fabc00f233925b6cca015c2525980556e2b src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java b4d27f69ad5d4cce03da9f04424dc35d30e8af29 Diff: https://reviews.apache.org/r/51929/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51924: Remove --release-threshold option from aurora job restart.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51924/#review149097 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 15, 2016, 6:13 p.m., Joshua Cohen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51924/ > --- > > (Updated Sept. 15, 2016, 6:13 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Bugs: AURORA-1681 > https://issues.apache.org/jira/browse/AURORA-1681 > > > Repository: aurora > > > Description > --- > > Remove --release-threshold option from aurora job restart. > > > Diffs > - > > RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 > src/main/python/apache/aurora/client/cli/jobs.py > 7b4c2692334acfddb53a52a602a5f07e94b4bd86 > > Diff: https://reviews.apache.org/r/51924/diff/ > > > Testing > --- > > > Thanks, > > Joshua Cohen > >
Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/ --- (Updated Sept. 14, 2016, 11:18 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Really rebasing. Repository: aurora Description --- This is the final part of the `BatchWorker` conversion work that converts `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem See https://reviews.apache.org/r/51759 #Remediation Task scheduling is one of the most dominant users of the write lock. It's also one of the heaviest and the most latency-sensitive. As such, the default max batch size is chosen conservatively low (3) and batch items are executed in a blocking way. BTW, attempting to make task scheduling non-blocking resulted in a much worse scheduling performance. The way our `DBTaskStore` is wired, all async activities, including `EventBus` are bound to use a single async `Executor`, which is currently limited at 8 threads [1]. Relying on the same `EventBus` to deliver scheduling completion events resulted in slower scheduling perf as those events were backed up behind all other activities, including tasks status events, reconciliation and etc. Increasing the executor thread pool size to a larger number on the other side, also increased the lock contention defeating the whole purpose of this work. #Results See https://reviews.apache.org/r/51759 for the lock contention results. https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9 src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed Diff: https://reviews.apache.org/r/51765/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/ --- (Updated Sept. 14, 2016, 11:17 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebased. Repository: aurora Description --- This is the final part of the `BatchWorker` conversion work that converts `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem See https://reviews.apache.org/r/51759 #Remediation Task scheduling is one of the most dominant users of the write lock. It's also one of the heaviest and the most latency-sensitive. As such, the default max batch size is chosen conservatively low (3) and batch items are executed in a blocking way. BTW, attempting to make task scheduling non-blocking resulted in a much worse scheduling performance. The way our `DBTaskStore` is wired, all async activities, including `EventBus` are bound to use a single async `Executor`, which is currently limited at 8 threads [1]. Relying on the same `EventBus` to deliver scheduling completion events resulted in slower scheduling perf as those events were backed up behind all other activities, including tasks status events, reconciliation and etc. Increasing the executor thread pool size to a larger number on the other side, also increased the lock contention defeating the whole purpose of this work. #Results See https://reviews.apache.org/r/51759 for the lock contention results. https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9 src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed Diff: https://reviews.apache.org/r/51765/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/ --- (Updated Sept. 14, 2016, 11:12 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebasing and comments. Repository: aurora Description --- This is the second part of the `BatchWorker` conversion work that moves cron jobs to use non-blocking kill followups and reduces the number of trigger threads. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem The current implementation of the cron scheduling relies on a large number of threads (`cron_scheduler_num_threads`=100) to support cron triggering and killing existing tasks according to `KILL_EXISTING` collision policy. This creates large spikes of activities at synchronized intervals as users tend to schedule their cron runs around similar schedules. Moreover, the current implementation re-acquires write locks multiple times to deliver on `KILL_EXISTING` policy. #Remediation Trigger level batching is still done in a blocking way but multiple cron triggers may be bundled together to share the same write transaction. Any followups, however, are performed in a non-blocking way by relying on a `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In order to still ensure non-concurrent execution of a given job key trigger, a token (job key) is saved within the trigger itself. A concurrent trigger will bail if a kill followup is still in progress (token is set AND no entry in `killFollowups` set exists yet). #Results The above approach allowed reducing the number of cron threads to 10 and likely can be reduced even further. See https://reviews.apache.org/r/51759 for the lock contention results. Diffs (updated) - commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java bc30990d57f444f7d64805ed85c363f1302736d0 config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java c07551e94f9221b5b21c5dc9715e82caa290c2e8 src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 155d702d68367b247dd066f773c662407f0e3b5b src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5c64ff2994e200b3453603ac5470e8e152cebc55 src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f Diff: https://reviews.apache.org/r/51763/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
> On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, > > lines 164-172 > > <https://reviews.apache.org/r/51763/diff/3/?file=1498135#file1498135line164> > > > > I probably don't completely understand the logic, so stupid question > > ahead: > > > > Couldn't we do that cleanup of JobDataMap in the handler where you > > currently do the `killFollowups.add(key)`? Having this spread out over the > > large `doExecute` function doesn't help understandability very much. Unfortunately, we can't. That was my first thought but the context and its detail map are reconstructed for every trigger and are gone the moment a trigger is released. > On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, > > line 174 > > <https://reviews.apache.org/r/51763/diff/3/?file=1498135#file1498135line174> > > > > I will probably find this message confusing if I see it in the logs, in > > particular if I have configured the job with KILL_EXISTING. Any suggestions? There are lots of moving parts addressed in this message, so I am not sure how to better explain it without describing how quartz works :) > On Sept. 14, 2016, 5:40 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java, > > line 274 > > <https://reviews.apache.org/r/51763/diff/3/?file=1498135#file1498135line274> > > > > The term "followup" is rather generic and difficuelt to understand if > > seen in isolation in the log. You should at least mention it is about > > killing a cron job. Good point. Rephrased. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/#review148933 --- On Sept. 14, 2016, 12:23 a.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51763/ > --- > > (Updated Sept. 14, 2016, 12:23 a.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the second part of the `BatchWorker` conversion work that moves cron > jobs to use non-blocking kill followups and reduces the number of trigger > threads. See https://reviews.apache.org/r/51759 for more background on the > `BatchWorker`. > > #Problem > The current implementation of the cron scheduling relies on a large number of > threads (`cron_scheduler_num_threads`=100) to support cron triggering and > killing existing tasks according to `KILL_EXISTING` collision policy. This > creates large spikes of activities at synchronized intervals as users tend to > schedule their cron runs around similar schedules. Moreover, the current > implementation re-acquires write locks multiple times to deliver on > `KILL_EXISTING` policy. > > #Remediation > Trigger level batching is still done in a blocking way but multiple cron > triggers may be bundled together to share the same write transaction. Any > followups, however, are performed in a non-blocking way by relying on a > `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. > In order to still ensure non-concurrent execution of a given job key trigger, > a token (job key) is saved within the trigger itself. A concurrent trigger > will bail if a kill followup is still in progress (token is set AND no entry > in `killFollowups` set exists yet). > > #Results > The above approach allowed reducing the number of cron threads to 10 and > likely can be reduced even further. See https://reviews.apache.org/r/51759 > for the lock contention results. > > > Diffs > - > > commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java > 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 > commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java > bc30990d57f444f7d64805ed85c363f1302736d0 > config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java > c07551e94f9221b5b21c5dc9715e82caa290c2e8 > src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java > 155d702d68367b247dd066f773c662407f0e3b5b > > src/test/java/org/apache/au
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 14, 2016, 6:36 p.m., Zameer Manji wrote: > > This patch LGTM. Just a single concern here and some questions. I will be > > moving on to the subsequent patches shortly. > > Please do not commit this until all parts are shipped. Also, please don't > > forget to rebase/update subsequent patches if changes are made here. > > > > Also, does this cover all state change consumers that interact on storage > > when the event is fired or is this a subset? Just wondering if this is > > supposed to apply to all of them or just a small set that you think need > > this change. > > Maxim Khutornenko wrote: > > Also, does this cover all state change consumers that interact on > storage when the event is fired or is this a subset > > This covers all known high frequency consumers that require a write > transaction. The list is derived by running a stack-dumping version of the > `CallOrderEnforcingStorage.write()` in a live cluster for awhile and > analysing top offenders. It may be not fully comprehensive though, so if you > find any other candidates feel free to report them. > > Zameer Manji wrote: > I think some comments at the consumers to say that batch worker is > required for high throughput would be greatly appreciated for future > contributers. I'd rather avoid comment copypasta everywhere we use `BatchWorker`. I thought it's enough to trace the execution from the callsite to the `BatchWorker` itself to see how/why it's used, no? - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148950 --- On Sept. 14, 2016, 10:41 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 14, 2016, 10:41 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on particular workload types: task even processing, > cron scheduling and task scheduling. Every instance can be tuned > independently (max batch size) and provides specialized metrics helping to > monitor each workload type perf. > > #Results > The proposed approach has been heavily tested in production and delivered the > best results. The lock contention latencies got down between 2x and 5x > depending on the cluster lo
Re: Review Request 51874: Change framework_name default value from 'TwitterScheduler' to 'Aurora'
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51874/#review148988 --- src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java (line 82) <https://reviews.apache.org/r/51874/#comment216516> Did you try to rollback to pre 0.15 scheduler while changing the framework name? Trying to see if we can drop this 'backwards incompatible' statement now. - Maxim Khutornenko On Sept. 14, 2016, 8:58 p.m., Santhosh Kumar Shanmugham wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51874/ > --- > > (Updated Sept. 14, 2016, 8:58 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1688 > https://issues.apache.org/jira/browse/AURORA-1688 > > > Repository: aurora > > > Description > --- > > Change framework_name default value from 'TwitterScheduler' to 'Aurora' > > > Diffs > - > > RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 > > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java > 8a386bd208956eb0c8c2f48874b0c6fb3af58872 > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > 97677f24a50963178a123b420d7ac136e4fde3fe > > Diff: https://reviews.apache.org/r/51874/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > Testing to make sure backward compatibility: > > Case 1: Rolling forward does not impact running tasks: > Renaming framework from 'TwitterScheduler' to 'Aurora': > > The framework re-registers after restart (treated by master as failover) and > gets the same framework-id. Running task remain unaffected. > > Master log: > I0914 16:48:28.408182 9815 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851- (TwitterScheduler) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks to > failover > I0914 16:48:28.408226 9815 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851- > E0914 16:48:28.408617 9819 process.cpp:2105] Failed to shutdown socket with > fd 28: Transport endpoint is not connected > I0914 16:48:43.722126 9813 master.cpp:2424] Received SUBSCRIBE call for > framework 'Aurora' at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > I0914 16:48:43.722190 9813 master.cpp:2500] Subscribing framework Aurora > with checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, > GPU_RESOURCES ] > I0914 16:48:43.75 9813 master.cpp:2564] Updating info for framework > 071c44a1-b4d4-4339-a727-03a79f725851- > I0914 16:48:43.722256 9813 master.cpp:2577] Framework > 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at > scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over > I0914 16:48:43.722429 9813 hierarchical.cpp:348] Activated framework > 071c44a1-b4d4-4339-a727-03a79f725851- > I0914 16:48:43.722595 9813 master.cpp:5709] Sending 1 offers to framework > 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 > > Scheduler log: > I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID > value: "071c44a1-b4d4-4339-a727-03a79f725851-" > , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae" > ip: 119646400 > port: 5050 > pid: "master@192.168.33.7:5050" > hostname: "aurora.local" > version: "1.0.0" > address { > hostname: "aurora.local" > ip: "192.168.33.7" > port: 5050 > } > > Case 2: Rolling backward does not impact running tasks: > Rolling back framework name from 'Aurora' to 'TwitterScheduler': > > The framework re-registers after restart (treated by master as failover) and > gets the same framework-id. Running task remain unaffected. > > Master log: > I0914 16:51:33.203495 9812 master.cpp:1297] Giving framework > 071c44a1-b4d4-4339-a727-03a79f725851- (Aurora) at > scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to > failover > I0914 16:51:33.203526 9812 hierarchical.cpp:382] Deactivated framework > 071c44a1-b4d4-4339-a727-03a79f725851- > I0914 16:51:49.614074 9813 master.cpp:2424] Recei
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
/aurora/scheduler/updater/JobUpdateControllerImpl.java 594bb6219294dcc77d48dcad14e2a6f9caa0c534 src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 14, 2016, 6:36 p.m., Zameer Manji wrote: > > This patch LGTM. Just a single concern here and some questions. I will be > > moving on to the subsequent patches shortly. > > Please do not commit this until all parts are shipped. Also, please don't > > forget to rebase/update subsequent patches if changes are made here. > > > > Also, does this cover all state change consumers that interact on storage > > when the event is fired or is this a subset? Just wondering if this is > > supposed to apply to all of them or just a small set that you think need > > this change. > Also, does this cover all state change consumers that interact on storage > when the event is fired or is this a subset This covers all known high frequency consumers that require a write transaction. The list is derived by running a stack-dumping version of the `CallOrderEnforcingStorage.write()` in a live cluster for awhile and analysing top offenders. It may be not fully comprehensive though, so if you find any other candidates feel free to report them. > On Sept. 14, 2016, 6:36 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 233 > > <https://reviews.apache.org/r/51759/diff/6/?file=1498583#file1498583line233> > > > > Instead of having an `Optional` and then later > > checking it's present when we compute the backoff, we could simplify the > > internal implementation by having non repeatable work items default to some > > sort of `NoopBackoffStrategy` instead. I don't really see how that would be a better solution in this particular case. We do need a `BackoffStrategy` in the repeatable work case it better be non-fake (and hard fail if not provided). Having a noop strategy softens that constraint. > On Sept. 14, 2016, 6:36 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/SchedulerModule.java, line 69 > > <https://reviews.apache.org/r/51759/diff/6/?file=1498584#file1498584line69> > > > > Just curious, is this default arbitrary or something that was derrived > > from your deployment in production? It's semi-arbitrary but proven in production :). - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148950 --- On Sept. 14, 2016, 5:25 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 14, 2016, 5:25 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > sin
Re: Review Request 51876: @ReviewBot retry Modify executor state transition logic to rely on health checks (if enabled)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51876/#review148937 --- src/main/python/apache/aurora/executor/aurora_executor.py (line 125) <https://reviews.apache.org/r/51876/#comment216462> I don't think we should include `Task is healthy.` message for non-health-check-enabled tasks as we have no visibility to the health of such tasks. src/main/python/apache/aurora/executor/aurora_executor.py (lines 133 - 141) <https://reviews.apache.org/r/51876/#comment216461> This code is largely a repetition of what alrady exists in health_checker.py. I'd be great to reuse it (say from task_info.py). src/main/python/apache/aurora/executor/common/health_checker.py (lines 40 - 43) <https://reviews.apache.org/r/51876/#comment216503> Do we really need this enum? A task that is in the middle of the `initial_interval_secs` timeout technically is still healthy, right? I feel having a boolean `isHealthy` should be enough to cover all possible scenarios as `StatusManager` checks for `isHealthy` AND the task status itself to invoke a callback. src/main/python/apache/aurora/executor/common/health_checker.py (line 69) <https://reviews.apache.org/r/51876/#comment216477> This comment needs update. src/main/python/apache/aurora/executor/common/health_checker.py (line 103) <https://reviews.apache.org/r/51876/#comment216474> `...if is_healthy else...` src/main/python/apache/aurora/executor/common/health_checker.py (lines 136 - 150) <https://reviews.apache.org/r/51876/#comment216506> If you get rid of the `HealthCheckStatus` enum I think all we need is this: ``` if self.initial_in_progress: if self.current_consecutive_health_checks >= self.min_consecutive_health_checks: self.healthy = True else: self.healthy = self.current_consecutive_failures > self.max_consecutive_failures self.reason = reason @property def initial_in_progress(self): if not self._expired: if self.elapsed_time <= self.initial_interval: return true else: self._expired = True return !self._expired ``` src/main/python/apache/aurora/executor/common/health_checker.py (line 149) <https://reviews.apache.org/r/51876/#comment216482> This should be log.info instead. src/main/python/apache/aurora/executor/common/status_checker.py (line 104) <https://reviews.apache.org/r/51876/#comment216500> Not obvious to me: what's the purpose of this condition here? src/main/python/apache/aurora/executor/status_manager.py (line 60) <https://reviews.apache.org/r/51876/#comment216473> How about extending the `StatusManager` to take two callbacks instead? The old callback would remain a default one and the new would be an optional one to exclusively process 'TASK_RUNNING'. Since it's now status aware, it would make sense to consolidate status management here and avoid the if-else switch upstream. Also, I'd use `mesos_pb2.TASK_RUNNING` instead of relying on string const. - Maxim Khutornenko On Sept. 14, 2016, 4:43 a.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51876/ > ------- > > (Updated Sept. 14, 2016, 4:43 a.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-1225 > https://issues.apache.org/jira/browse/AURORA-1225 > > > Repository: aurora > > > Description > --- > > Modify executor state transition logic to rely on health checks (if enabled). > > [Summary] > Executor needs to start executing user content in STARTING and transition to > RUNNING when a successful required number of health checks is reached. > > This review contains a series of executor changes that implement the health > check driven updates. It gives more context of the design of this feature. > > [Background] > Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > [Description] > If health check is enabled on vCurrent executor, the health checker will send > a "TASK_RUNNING" message when a successful required number of health checks > is reached within the initial_interval_secs. On the other hand, a > "TASK_FAILED" message was sent if the health checker fails to reach the > requ
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
uler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 14, 2016, 4:56 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskThrottler.java, > > lines 76-90 > > <https://reviews.apache.org/r/51759/diff/4/?file=1498120#file1498120line76> > > > > This code is exposing a slight design smell in my eye: We are using the > > `DelayedExecutor` to trigger the `batchWorker` when a backoff timer has > > passed. However, the latter already provides an internal backoff mechanism. > > > > I'd rather see if we have just one defined way how to handle those > > time-based backoffs: > > > > * either the BatchWorker does not perform any backoff handling at all. > > This seems to be in line with what Zameer has proposed in his last comment, > > * or we make the backoff mechanism a first-class feature of the > > batchWorker so that I can say I can inject `backoffStrategy` and > > `lastBackoffMsec` to be used for the `WorkItem` created in > > `batchWorker.execute`. > > Stephan Erb wrote: > The same applies to the `startGroup(final TaskGroup group)` in Part 3: A > backoff mechanism is used that calls the batchWorker that could provide a > backoff on its own. The `BatchWorker` is currently supporting only 2 use cases: 1. Simple no-repeat work item execution (`BatchWorker.execute()`) 2. Repeatable work item execution (`BatchWorker.executeWithReplay()`) I don't see why `BatchWorker` has to use `BackoffStrategy` everywhere or not use it at all. There is a clear use case for it in #2 and absolutely no purpose in #1. What you are suggesting adds yet another case of something like `BatchWorker.scheduleExecution()`, which goes beyond of what's immediately necessary to address the above cited goals. I don't mind considering further improvements to the `BatchWorker` outside of this patch scope but I'd like to keep it concise to avoid feature creep. > On Sept. 14, 2016, 4:56 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 228 > > <https://reviews.apache.org/r/51759/diff/4/?file=1498117#file1498117line228> > > > > Do you have any particular usecase for this metric? > > > > We already have `batchesProcessed` and `itemsProcessed` and can compute > > the rate of processed items and batches. This seems more robust to me than > > just looking at `lastBatchSize`. It's more of a convenience metric, similar to how `TimedInterceptor` exposes both `total_nanos` and `total_events` in addition to `events_per_sec`. If you feel strong about removing redundancy here I am ok dropping it. Just thought having a single metric to track the effective batch size may be helpful. > On Sept. 14, 2016, 4:56 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 197 > > <https://reviews.apache.org/r/51759/diff/4/?file=1498117#file1498117line197> > > > > `readyItems` is still a left over from the previous patch version. > > Could be renamed to just `batch`. Thanks. Fixed. - Maxim ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148918 --- On Sept. 14, 2016, 4:47 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 14, 2016, 4:47 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput &
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
g/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 14, 2016, 1:29 a.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 110 > > <https://reviews.apache.org/r/51759/diff/4/?file=1498117#file1498117line110> > > > > The documentation implies we are returning a boolean but we are > > returning a `Result` object. I think you need to update that. I think what > > you are saying is that the `Result` object signals through `isCompleted` > > that the work should be repeated right? > > > > (As an aside, maybe we should name this to `shouldReschedule` or > > similar). > > > > > > If work needs to be repeated once the result has been computed, why > > can't the caller just access the `BatchWorker` in the `Work` it submits > > and do that itself? > > > > That is why can't we extend `apply` to also accept a `BatchWorker` and > > the caller can submit another task within the `Work` if that's required? > > > > Alternatively now that `execute` returns a `CompletableFuture`, the > > caller could use one of the many methods to execute code when the future is > > complete and have that extra code add more work. Fixed the javadoc and converted `NoResult` into a static field to simplify callsite consumption. As for the second suggestion, I don't like disseminating `BatchWorkers` or work item state maintenance outside the `BatchWorker` itself. That would put the burden of maintaining and passing around the state to the callsite and may create hard to reason/troubleshoot use cases. Consider the https://reviews.apache.org/r/51763. We would have to use an explicit wrapper object to hold on to local data to be able to re-queue it again as the trigger context is gone the moment `BatchWorker` call is placed. Also, queueing recurrent homogenuous events via `CompletableStage` chaining does not strike me as the right use of the functional style. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148837 --- On Sept. 14, 2016, 12:16 a.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 14, 2016, 12:16 a.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on p
Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/ --- (Updated Sept. 14, 2016, 12:29 a.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebased over 51759. Repository: aurora Description --- This is the final part of the `BatchWorker` conversion work that converts `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem See https://reviews.apache.org/r/51759 #Remediation Task scheduling is one of the most dominant users of the write lock. It's also one of the heaviest and the most latency-sensitive. As such, the default max batch size is chosen conservatively low (3) and batch items are executed in a blocking way. BTW, attempting to make task scheduling non-blocking resulted in a much worse scheduling performance. The way our `DBTaskStore` is wired, all async activities, including `EventBus` are bound to use a single async `Executor`, which is currently limited at 8 threads [1]. Relying on the same `EventBus` to deliver scheduling completion events resulted in slower scheduling perf as those events were backed up behind all other activities, including tasks status events, reconciliation and etc. Increasing the executor thread pool size to a larger number on the other side, also increased the lock contention defeating the whole purpose of this work. #Results See https://reviews.apache.org/r/51759 for the lock contention results. https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9 src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed Diff: https://reviews.apache.org/r/51765/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/ --- (Updated Sept. 14, 2016, 12:23 a.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebasing over 51759. Repository: aurora Description --- This is the second part of the `BatchWorker` conversion work that moves cron jobs to use non-blocking kill followups and reduces the number of trigger threads. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem The current implementation of the cron scheduling relies on a large number of threads (`cron_scheduler_num_threads`=100) to support cron triggering and killing existing tasks according to `KILL_EXISTING` collision policy. This creates large spikes of activities at synchronized intervals as users tend to schedule their cron runs around similar schedules. Moreover, the current implementation re-acquires write locks multiple times to deliver on `KILL_EXISTING` policy. #Remediation Trigger level batching is still done in a blocking way but multiple cron triggers may be bundled together to share the same write transaction. Any followups, however, are performed in a non-blocking way by relying on a `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In order to still ensure non-concurrent execution of a given job key trigger, a token (job key) is saved within the trigger itself. A concurrent trigger will bail if a kill followup is still in progress (token is set AND no entry in `killFollowups` set exists yet). #Results The above approach allowed reducing the number of cron threads to 10 and likely can be reduced even further. See https://reviews.apache.org/r/51759 for the lock contention results. Diffs (updated) - commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java bc30990d57f444f7d64805ed85c363f1302736d0 config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java c07551e94f9221b5b21c5dc9715e82caa290c2e8 src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 155d702d68367b247dd066f773c662407f0e3b5b src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5c64ff2994e200b3453603ac5470e8e152cebc55 src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f Diff: https://reviews.apache.org/r/51763/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
va/org/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 11, 2016, 11:02 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 257-259 > > <https://reviews.apache.org/r/51759/diff/1/?file=1495031#file1495031line257> > > > > 1) Doesn't that effectively result in a busy loop where we are queuing > > the same item over and over until it is can finally be executed? This > > sounds rather problematic. > > > > 2) This changes the order of work items. Are you sure that no storage > > client requires writes to be in order? > > Maxim Khutornenko wrote: > > 1) Doesn't that effectively result in a busy loop where we are queuing > the same item over and over until it is can finally be executed? This sounds > rather problematic. > > There is no busy loop in a pure sense as every `RepeatableWork` item must > have a backoff strategy associated with it. Any item scheduled for > re-processing is delayed according to the supplied backoff. > > > 2) This changes the order of work items. Are you sure that no storage > client requires writes to be in order? > > Order has never been implied or expected as far as cross-transaction > boundaries go. That said, the order is preserved for a given work type. We > re-queue incomplete items in the same order they were received. > > Stephan Erb wrote: > Regarding question 1) > > It seems to me that the `run()` loop will drain a batch of WorkItems from > the `workQueue`. We then call `getReadyItems` and re-queue everything that is > currently in backoff. This repeats over and over again until the backoff time > has passed. > > Have you considered using a `@AsyncExecutor DelayExecutor` just like in > `TaskThrottler` for everything that is in backoff and cannot be executed > immediately? In a real cluster, there is no shortage of items pending evaluation. As such, the FOLLOWUP item "tax" is almost never a problem as BatchWorker will always be in a "busy loop" processing new and old items. I can see how this can become too wasteful in a really small cluster with not too many repeatable work items though. I just realized how I can improve on the above and simplify the work queue processing. Instead of relying on a queue to drive backoff, we can use backoff as a delay on re-queueing incomplete items. This will let us get rid of the `getReadyItems()` entirely. Thanks for bringing this up! - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148433 --- On Sept. 13, 2016, 11:14 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 13, 2016, 11:14 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock conten
Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/ --- (Updated Sept. 13, 2016, 11:14 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebased over 51759. Repository: aurora Description --- This is the final part of the `BatchWorker` conversion work that converts `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem See https://reviews.apache.org/r/51759 #Remediation Task scheduling is one of the most dominant users of the write lock. It's also one of the heaviest and the most latency-sensitive. As such, the default max batch size is chosen conservatively low (3) and batch items are executed in a blocking way. BTW, attempting to make task scheduling non-blocking resulted in a much worse scheduling performance. The way our `DBTaskStore` is wired, all async activities, including `EventBus` are bound to use a single async `Executor`, which is currently limited at 8 threads [1]. Relying on the same `EventBus` to deliver scheduling completion events resulted in slower scheduling perf as those events were backed up behind all other activities, including tasks status events, reconciliation and etc. Increasing the executor thread pool size to a larger number on the other side, also increased the lock contention defeating the whole purpose of this work. #Results See https://reviews.apache.org/r/51759 for the lock contention results. https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9 src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed Diff: https://reviews.apache.org/r/51765/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
9caa0c534 src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/ --- (Updated Sept. 13, 2016, 11:14 p.m.) Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Changes --- Rebased over 51759. Repository: aurora Description --- This is the second part of the `BatchWorker` conversion work that moves cron jobs to use non-blocking kill followups and reduces the number of trigger threads. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem The current implementation of the cron scheduling relies on a large number of threads (`cron_scheduler_num_threads`=100) to support cron triggering and killing existing tasks according to `KILL_EXISTING` collision policy. This creates large spikes of activities at synchronized intervals as users tend to schedule their cron runs around similar schedules. Moreover, the current implementation re-acquires write locks multiple times to deliver on `KILL_EXISTING` policy. #Remediation Trigger level batching is still done in a blocking way but multiple cron triggers may be bundled together to share the same write transaction. Any followups, however, are performed in a non-blocking way by relying on a `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In order to still ensure non-concurrent execution of a given job key trigger, a token (job key) is saved within the trigger itself. A concurrent trigger will bail if a kill followup is still in progress (token is set AND no entry in `killFollowups` set exists yet). #Results The above approach allowed reducing the number of cron threads to 10 and likely can be reduced even further. See https://reviews.apache.org/r/51759 for the lock contention results. Diffs (updated) - commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java bc30990d57f444f7d64805ed85c363f1302736d0 config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java c07551e94f9221b5b21c5dc9715e82caa290c2e8 src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 155d702d68367b247dd066f773c662407f0e3b5b src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5c64ff2994e200b3453603ac5470e8e152cebc55 src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f Diff: https://reviews.apache.org/r/51763/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51712/#review148762 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 13, 2016, 6:33 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51712/ > --- > > (Updated Sept. 13, 2016, 6:33 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1764 > https://issues.apache.org/jira/browse/AURORA-1764 > > > Repository: aurora > > > Description > --- > > This extends getJobUpdateDetails to return a list of details instead of being > scoped to a single update. > > > Diffs > - > > RELEASE-NOTES.md 4476d52c9e2363342d0c22c7d136e0e481633686 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java > 4202f0eec59ef16668fca6b6ebb925335ad5dea1 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/python/apache/aurora/client/cli/update.py > d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 > src/main/resources/scheduler/assets/js/services.js > 315fc17894c2261333715fbb9c45fc02505f2942 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > a7d1f74acdfe5f58eb822a4d826e920cad53dced > src/test/python/apache/aurora/client/api/test_api.py > 7a0797d5402e931006d4f215e2d9c0dbbd113257 > src/test/python/apache/aurora/client/cli/test_supdate.py > 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 > > Diff: https://reviews.apache.org/r/51712/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51712/#review148643 --- src/main/python/apache/aurora/client/api/__init__.py (line 266) <https://reviews.apache.org/r/51712/#comment216139> This should send both key and query for both v-1 and v schedulers to work correctly. src/main/python/apache/aurora/client/cli/update.py (lines 529 - 533) <https://reviews.apache.org/r/51712/#comment216138> Are you sure the `detailsList` is going to be set when a response is coming from the v-1 scheduler? Even if that was the case, it would be completely empty. I'd expect something like this instead: ``` if detailsList is not None: # process detailsList else # process details ``` - Maxim Khutornenko On Sept. 12, 2016, 10:26 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51712/ > --- > > (Updated Sept. 12, 2016, 10:26 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1764 > https://issues.apache.org/jira/browse/AURORA-1764 > > > Repository: aurora > > > Description > --- > > This extends getJobUpdateDetails to return a list of details instead of being > scoped to a single update. > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java > 4202f0eec59ef16668fca6b6ebb925335ad5dea1 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/python/apache/aurora/client/cli/update.py > d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 > src/main/resources/scheduler/assets/js/services.js > 315fc17894c2261333715fbb9c45fc02505f2942 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > a7d1f74acdfe5f58eb822a4d826e920cad53dced > src/test/python/apache/aurora/client/api/test_api.py > 7a0797d5402e931006d4f215e2d9c0dbbd113257 > src/test/python/apache/aurora/client/cli/test_supdate.py > 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 > > Diff: https://reviews.apache.org/r/51712/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 13, 2016, 1:14 a.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 188 > > <https://reviews.apache.org/r/51759/diff/2/?file=1497462#file1497462line188> > > > > Our current metrics use 'nanos' to indicate nanoseconds instead of > > 'ns'. Mind using that here for consistency? We actually have both "nanos" and "ns". I agree that "nanos" looks more prevalent as it's what `TimedInterceptor` is using. > On Sept. 13, 2016, 1:14 a.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 105 > > <https://reviews.apache.org/r/51759/diff/2/?file=1497462#file1497462line105> > > > > I agree we cannot use `java.util.concurrent.FutureTask` here. However, > > this seems to be pretty close to > > `java.util.concurrent.CompletableFuture` (new in Java 8). > > > > It's not clear to me here if you have investigated it or not, but if > > you have I would like a brief explanation as to why it is not suitable. If > > you haven't please take a look at it as it seems to provide all of the > > funtionality you need. > > > > Further if it is possible to use it, then we do not need pubsub for > > notification for when work completes, if item is it's own future, we can > > use the `CompletionStage` functionality to execute code after a future has > > been fulfilled. > > > > It has a `get()` method that can replace `waitForResult()` and a > > `complete()` method that allows you to set the value. I did evaluate the `CompletableFuture` for the initial version of the `BatchWorker` and did not find it up to the task. Looking at it again though, I don't see any reasons why it would not satisfy the current version! Thanks for the nudge, the diff is incoming. - Maxim ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148633 --- On Sept. 12, 2016, 10:51 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 12, 2016, 10:51 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on particular workload types: task even pr
Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
> On Sept. 11, 2016, 11:18 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, > > lines 115-128 > > <https://reviews.apache.org/r/51765/diff/1/?file=1495174#file1495174line115> > > > > The `scheduleTask` method is doing some staggering amount of work, such > > as iterating over offers and matching constraints. Only a small part of > > this actually requires an open write transaction. > > > > It seems to me as if moving part of this computation out of the write > > transaction could significantly improve write throughput. You have probably > > considered that, so what am I missing here? :-) It's true that only a small part requires writing into the store. Unfortunately, we do use write transaction as a global system lock to freeze the state for the duration of the logical change. It's very hard to reason about that logic if things change underneath. The assumptions about being inside of a "locked" state begin with computing the `AttributeAggregate` in the `TaskScheduler` and follow along until we are ready to launch task or consider a preemption. There are MANY things that can go wrong if we move the transaction boundary. Starting from trivial nullref and concurrent collection modification exceptions to very hard to troubleshoot logical errors (offers reused, limit constraints are not enforced and etc.). While it's theoretically possible to implement an optimistic task/offer matching it would require substantial rewrite of what we currently have and is definitely outside the scope of this change. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/#review148434 --- On Sept. 9, 2016, 6:52 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51765/ > --- > > (Updated Sept. 9, 2016, 6:52 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the final part of the `BatchWorker` conversion work that converts > `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background > on the `BatchWorker`. > > #Problem > See https://reviews.apache.org/r/51759 > > #Remediation > Task scheduling is one of the most dominant users of the write lock. It's > also one of the heaviest and the most latency-sensitive. As such, the default > max batch size is chosen conservatively low (3) and batch items are executed > in a blocking way. > > BTW, attempting to make task scheduling non-blocking resulted in a much worse > scheduling performance. The way our `DBTaskStore` is wired, all async > activities, including `EventBus` are bound to use a single async `Executor`, > which is currently limited at 8 threads [1]. Relying on the same `EventBus` > to deliver scheduling completion events resulted in slower scheduling perf as > those events were backed up behind all other activities, including tasks > status events, reconciliation and etc. Increasing the executor thread pool > size to a larger number on the other side, also increased the lock contention > defeating the whole purpose of this work. > > #Results > See https://reviews.apache.org/r/51759 for the lock contention results. > > https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 > > > Diffs > - > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 9d0d40b82653fb923bed16d06546288a1576c21d > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java > 11e8033438ad0808e446e41bb26b3fa4c04136c7 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > c044ebe6f72183a67462bbd8e5be983eb592c3e9 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java > d266f6a25ae2360db2977c43768a19b1f1efe8ff > src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java > c2ceb4e7685a9301f8014a9183e02fbad65bca26 > src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java > 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java > 72562e6bd9a9860c834e6a9faa094c28600a8fed > > Diff: https://reviews.apache.org/r/51765/diff/ > > > Testing > --- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
r/JobUpdateControllerImpl.java 594bb6219294dcc77d48dcad14e2a6f9caa0c534 src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
> On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote: > > Along the lines of the question Stephan raised, what happens in the event > > of a failover mid-batch, especially w.r.t. repeatable work? Same as today: the transaction either happens or does not. In case of a cron job (the only user of `RepeatableWork` at the moment), the cron job will not be triggered (again, same as today). > On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 106-107 > > <https://reviews.apache.org/r/51759/diff/1/?file=1495031#file1495031line106> > > > > I think this should be "does not initialize"? Good catch. Fixed. > On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 257-259 > > <https://reviews.apache.org/r/51759/diff/1/?file=1495031#file1495031line257> > > > > Even though there's only a single callsite where we initialize a > > `WorkItem` in `FOLLOWUP` state, there's no guarantee that `laskBackoffMsec` > > will be present just because the state is `FOLLOWUP`. > > > > We should add some state checks to the `WorkItem` constructor so that > > if state is `FOLLOWUP` we'll throw if `lastBackoffMsec` is not present to > > protect against future mistakes? (Or if that's a valid scenario we should > > perform an `isPresent` check or use `or` here...). I assert both `backoffStrategy` and `lastBackoffMsec` inside the `backoffFor()` method on line 315. I think it should be sufficient given that `WorkItem` is a private class? > On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 273-275 > > <https://reviews.apache.org/r/51759/diff/1/?file=1495031#file1495031line273> > > > > Should we do this after the batch is processed rather than before? It's largely irrelevant but I guess it's more logical to expect it at the end while going through the code. Moved. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148501 --- On Sept. 9, 2016, 5:29 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 9, 2016, 5:29 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are d
Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
59/diff/1/?file=1495031#file1495031line240> > > > > A queue could hold vital information that we should not loose during > > scheduler failover (in particular, as we do this once per day). > > > > Would it be possible to drain the queue completely before exiting here? This is no different than terminating the main process while threads are waiting on a write lock. Transaction boundaries and reconciliation on restart is what ensures data consistency. > On Sept. 11, 2016, 11:02 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 257-259 > > <https://reviews.apache.org/r/51759/diff/1/?file=1495031#file1495031line257> > > > > 1) Doesn't that effectively result in a busy loop where we are queuing > > the same item over and over until it is can finally be executed? This > > sounds rather problematic. > > > > 2) This changes the order of work items. Are you sure that no storage > > client requires writes to be in order? > 1) Doesn't that effectively result in a busy loop where we are queuing the > same item over and over until it is can finally be executed? This sounds > rather problematic. There is no busy loop in a pure sense as every `RepeatableWork` item must have a backoff strategy associated with it. Any item scheduled for re-processing is delayed according to the supplied backoff. > 2) This changes the order of work items. Are you sure that no storage client > requires writes to be in order? Order has never been implied or expected as far as cross-transaction boundaries go. That said, the order is preserved for a given work type. We re-queue incomplete items in the same order they were received. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148433 --- On Sept. 9, 2016, 5:29 p.m., Maxim Khutornenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > --- > > (Updated Sept. 9, 2016, 5:29 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel [1]. While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on particular workload types: task even processing, > cron scheduling and task scheduling. Every instance can be tuned > independently (max batch size) and provides specialized metrics helping to > monitor each workload type perf. > &
Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources
> On Sept. 12, 2016, 7:31 p.m., Maxim Khutornenko wrote: > > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java, > > lines 22-33 > > <https://reviews.apache.org/r/51807/diff/1/?file=1496737#file1496737line22> > > > > This is somewhat non-standard way to define cmd flags. We tend to > > declare them in modules and use 'setting' classes to act as pure wrappers. > > Perhaps it's time to introduce a `ResourceModule` class (if only to bind > > the `ResourceSettings` instance for now)? > > Stephan Erb wrote: > I believe there would be no way to inject the settings into the > `ResourceType` enum at compile time with guice. That's why I've adopted this > approach here. Happy to give it a try if you have some pointers for me how > to make it work. Oh, that's right. I forgot java enums are extremely picky when it comes to field injections. Ignore this comment. Perhaps having an extra note in this class javadoc will help avoiding this type of questions in the future. > On Sept. 12, 2016, 7:31 p.m., Maxim Khutornenko wrote: > > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java, line > > 75 > > <https://reviews.apache.org/r/51807/diff/1/?file=1496738#file1496738line75> > > > > Would be great to have a e2e test when this functionality is available > > in Mesos. Perhaps drop a TODO here? > > Stephan Erb wrote: > Acutally, it would work in Mesos out of the box even today. All we have > to do is adapt our Mesos config to also pass along memory resources > https://github.com/apache/aurora/blob/master/examples/vagrant/mesos_config/etc_mesos-slave/modules#L8. > > I didn't do it because I felt like it wouldn't add any additional test > coverage, now that the entire resource handling within Aurora is generic. > > Stephan Erb wrote: > Additional info: We already have an e2e test using revocable CPU > resources. It's true that things should now "just work". Reality proves that's not always the case, so any test coverage helping to validate we can actually run tasks against offers with more than one revocable resource would be welcome. Does not have to be in this RB though. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51807/#review148563 --- On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51807/ > --- > > (Updated Sept. 12, 2016, 1:38 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > We plan to open source a very simple Mesos ResourceEstimator and > QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We > have been using it internally with a patched Aurora version where the > hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch > makes this behaviour configurable. > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > docs/features/resource-isolation.md > 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 > docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 > docs/reference/scheduler-configuration.md > 87d2cded0780ffa34884b52cc049c6a0ad808f0a > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java > 4c102a3d4c4bdd27a1d0536b689acd6257e77788 > > Diff: https://reviews.apache.org/r/51807/diff/ > > > Testing > --- > > ./gradlew -Pq build > > > Thanks, > > Stephan Erb > >
Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51807/#review148600 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51807/ > --- > > (Updated Sept. 12, 2016, 1:38 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > We plan to open source a very simple Mesos ResourceEstimator and > QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We > have been using it internally with a patched Aurora version where the > hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch > makes this behaviour configurable. > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > docs/features/resource-isolation.md > 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 > docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 > docs/reference/scheduler-configuration.md > 87d2cded0780ffa34884b52cc049c6a0ad808f0a > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java > 4c102a3d4c4bdd27a1d0536b689acd6257e77788 > > Diff: https://reviews.apache.org/r/51807/diff/ > > > Testing > --- > > ./gradlew -Pq build > > > Thanks, > > Stephan Erb > >
Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery
> On Sept. 12, 2016, 7:48 p.m., Maxim Khutornenko wrote: > > src/main/python/apache/aurora/client/api/__init__.py, line 264 > > <https://reviews.apache.org/r/51712/diff/4/?file=1497099#file1497099line264> > > > > Why not going all the way to the client command and initializing both > > values in this changelist? Otherwise, there is a risk of cutting a release > > in between and delaying removal by extra release (until both client and > > scheduler are dual writing/reading). > > Zameer Manji wrote: > We can't send both values (it doesn't make sense). I will upgrade the > client to send the query object instead of the key object. I was not aware > when we made an API change we would also change the clients in the same > commit. > > Zameer Manji wrote: > Actually never mind, if I make this change (updating the client) it will > force us to deploy the client and the scheduler together. I think we need to > keep it as is for now right? > We can't send both values (it doesn't make sense). This will work just fine. The newer scheduler will prefer the query (and collection in the response), while the old one will continue using the key completely unaware of the new value (thrift drops unknown fields). This way no functional changes will be needed on the client side, just removal of the deprecated fields. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51712/#review148565 --- On Sept. 12, 2016, 7:39 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51712/ > ----------- > > (Updated Sept. 12, 2016, 7:39 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1764 > https://issues.apache.org/jira/browse/AURORA-1764 > > > Repository: aurora > > > Description > --- > > This extends getJobUpdateDetails to return a list of details instead of being > scoped to a single update. > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java > 4202f0eec59ef16668fca6b6ebb925335ad5dea1 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/resources/scheduler/assets/js/services.js > 315fc17894c2261333715fbb9c45fc02505f2942 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > a7d1f74acdfe5f58eb822a4d826e920cad53dced > src/test/python/apache/aurora/client/api/test_api.py > 7a0797d5402e931006d4f215e2d9c0dbbd113257 > > Diff: https://reviews.apache.org/r/51712/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51712/#review148565 --- src/main/python/apache/aurora/client/api/__init__.py (line 264) <https://reviews.apache.org/r/51712/#comment216022> Why not going all the way to the client command and initializing both values in this changelist? Otherwise, there is a risk of cutting a release in between and delaying removal by extra release (until both client and scheduler are dual writing/reading). - Maxim Khutornenko On Sept. 12, 2016, 7:39 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51712/ > --- > > (Updated Sept. 12, 2016, 7:39 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1764 > https://issues.apache.org/jira/browse/AURORA-1764 > > > Repository: aurora > > > Description > --- > > This extends getJobUpdateDetails to return a list of details instead of being > scoped to a single update. > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java > 4202f0eec59ef16668fca6b6ebb925335ad5dea1 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/resources/scheduler/assets/js/services.js > 315fc17894c2261333715fbb9c45fc02505f2942 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > a7d1f74acdfe5f58eb822a4d826e920cad53dced > src/test/python/apache/aurora/client/api/test_api.py > 7a0797d5402e931006d4f215e2d9c0dbbd113257 > > Diff: https://reviews.apache.org/r/51712/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148564 --- Fix it, then Ship it! RELEASE-NOTES.md (line 37) <https://reviews.apache.org/r/51662/#comment216018> s/reconcile/reconcile_tasks src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java (lines 651 - 657) <https://reviews.apache.org/r/51662/#comment216019> No need to have an extra method here. Can be inlined. - Maxim Khutornenko On Sept. 12, 2016, 5:49 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > --- > > (Updated Sept. 12, 2016, 5:49 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/admin/admin.py > 76009b9c1c7a130c25abad544a176dc590dafb12 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java > b9317dc20456f90057ec2bf4d10619a5ae986187 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/python/apache/aurora/admin/test_admin.py > eb193c4a33f5275f05a995338446bec0e19bc1cf > src/test/python/apache/aurora/client/api/test_scheduler_client.py > afac2500551af2fce406bb906aa4e33f353e90a1 > > Diff: https://reviews.apache.org/r/51662/diff/ > > > Testing > --- > > * Manually tested on my local vagrant installation > * ./build-support/jenkins/build.sh > > > Thanks, > > Karthik Anantha Padmanabhan > >
Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51807/#review148563 --- src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java (lines 22 - 33) <https://reviews.apache.org/r/51807/#comment216016> This is somewhat non-standard way to define cmd flags. We tend to declare them in modules and use 'setting' classes to act as pure wrappers. Perhaps it's time to introduce a `ResourceModule` class (if only to bind the `ResourceSettings` instance for now)? src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java (line 75) <https://reviews.apache.org/r/51807/#comment216017> Would be great to have a e2e test when this functionality is available in Mesos. Perhaps drop a TODO here? - Maxim Khutornenko On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51807/ > --- > > (Updated Sept. 12, 2016, 1:38 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > We plan to open source a very simple Mesos ResourceEstimator and > QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We > have been using it internally with a patched Aurora version where the > hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch > makes this behaviour configurable. > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > docs/features/resource-isolation.md > 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 > docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 > docs/reference/scheduler-configuration.md > 87d2cded0780ffa34884b52cc049c6a0ad808f0a > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java > 4c102a3d4c4bdd27a1d0536b689acd6257e77788 > > Diff: https://reviews.apache.org/r/51807/diff/ > > > Testing > --- > > ./gradlew -Pq build > > > Thanks, > > Stephan Erb > >
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
> On Sept. 9, 2016, 12:26 a.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java, > > line 125 > > <https://reviews.apache.org/r/51662/diff/8/?file=1494851#file1494851line125> > > > > This smells to me and seems awkward. I guess this stemms from the fact > > that the `I*` wrappers don't have Optional fields for integers. Instead of > > this here, why not have the signature be > > `triggerExplicitReconciliation(Optional batchSize)`? > > > > You can then do `int trueBatchSize = > > batchSize.orElse(settings.explicitBatchSize)`. > > > > You can then push validation of the value to the RPC/API layer. +1. Validating at RPC layer and sending down an `Optional` seems a better solution here. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148292 --- On Sept. 9, 2016, 6:43 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > ------- > > (Updated Sept. 9, 2016, 6:43 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/admin/admin.py > 76009b9c1c7a130c25abad544a176dc590dafb12 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java > b9317dc20456f90057ec2bf4d10619a5ae986187 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/python/apache/aurora/admin/test_admin.py > eb193c4a33f5275f05a995338446bec0e19bc1cf > src/test/python/apache/aurora/client/api/test_scheduler_client.py > afac2500551af2fce406bb906aa4e33f353e90a1 > > Diff: https://reviews.apache.org/r/51662/diff/ > > > Testing > --- > > * Manually tested on my local vagrant installation > * ./build-support/jenkins/build.sh > > > Thanks, > > Karthik Anantha Padmanabhan > >
Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51765/ --- Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Repository: aurora Description --- This is the final part of the `BatchWorker` conversion work that converts `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem See https://reviews.apache.org/r/51759 #Remediation Task scheduling is one of the most dominant users of the write lock. It's also one of the heaviest and the most latency-sensitive. As such, the default max batch size is chosen conservatively low (3) and batch items are executed in a blocking way. BTW, attempting to make task scheduling non-blocking resulted in a much worse scheduling performance. The way our `DBTaskStore` is wired, all async activities, including `EventBus` are bound to use a single async `Executor`, which is currently limited at 8 threads [1]. Relying on the same `EventBus` to deliver scheduling completion events resulted in slower scheduling perf as those events were backed up behind all other activities, including tasks status events, reconciliation and etc. Increasing the executor thread pool size to a larger number on the other side, also increased the lock contention defeating the whole purpose of this work. #Results See https://reviews.apache.org/r/51759 for the lock contention results. https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54 Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 9d0d40b82653fb923bed16d06546288a1576c21d src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 11e8033438ad0808e446e41bb26b3fa4c04136c7 src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java c044ebe6f72183a67462bbd8e5be983eb592c3e9 src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java d266f6a25ae2360db2977c43768a19b1f1efe8ff src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java c2ceb4e7685a9301f8014a9183e02fbad65bca26 src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 72562e6bd9a9860c834e6a9faa094c28600a8fed Diff: https://reviews.apache.org/r/51765/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Review Request 51763: Batching writes - Part 2 (of 3): Converting cron jobs to use BatchWorker.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51763/ --- Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. Repository: aurora Description --- This is the second part of the `BatchWorker` conversion work that moves cron jobs to use non-blocking kill followups and reduces the number of trigger threads. See https://reviews.apache.org/r/51759 for more background on the `BatchWorker`. #Problem The current implementation of the cron scheduling relies on a large number of threads (`cron_scheduler_num_threads`=100) to support cron triggering and killing existing tasks according to `KILL_EXISTING` collision policy. This creates large spikes of activities at synchronized intervals as users tend to schedule their cron runs around similar schedules. Moreover, the current implementation re-acquires write locks multiple times to deliver on `KILL_EXISTING` policy. #Remediation Trigger level batching is still done in a blocking way but multiple cron triggers may be bundled together to share the same write transaction. Any followups, however, are performed in a non-blocking way by relying on a `BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In order to still ensure non-concurrent execution of a given job key trigger, a token (job key) is saved within the trigger itself. A concurrent trigger will bail if a kill followup is still in progress (token is set AND no entry in `killFollowups` set exists yet). #Results The above approach allowed reducing the number of cron threads to 10 and likely can be reduced even further. See https://reviews.apache.org/r/51759 for the lock contention results. Diffs - commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java bc30990d57f444f7d64805ed85c363f1302736d0 src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java c07551e94f9221b5b21c5dc9715e82caa290c2e8 src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 155d702d68367b247dd066f773c662407f0e3b5b src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 5c64ff2994e200b3453603ac5470e8e152cebc55 src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f Diff: https://reviews.apache.org/r/51763/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.
g/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 99c27e8012f10a67ce5f1b84d258e7a5608995c7 src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java 7d104aa2ea4a4d99be4711f666d18beca238284e src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java 94f5ca565476f62d72879837a0e7dafabcf30432 src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 196df4754b553f05e50b66ad2f84271901bc9eba Diff: https://reviews.apache.org/r/51759/diff/ Testing --- All types of testing including deploying to test and production clusters. Thanks, Maxim Khutornenko
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
> On Sept. 8, 2016, 6:18 p.m., Maxim Khutornenko wrote: > > src/main/python/apache/aurora/admin/admin.py, line 351 > > <https://reviews.apache.org/r/51662/diff/7/?file=1494553#file1494553line351> > > > > Needs clarifcation that `reconciliation_explicit_batch_size` is a > > scheduler setting. > > Karthik Anantha Padmanabhan wrote: > Good question. It does not show up in the docs. However it is a command > line argument in the `ReconciliationModule`. So should this be a scheduler > setting ? By 'setting' I meant the command line argument. Without any explanation it's hard to grok what that means. > On Sept. 8, 2016, 6:18 p.m., Maxim Khutornenko wrote: > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java, > > line 123 > > <https://reviews.apache.org/r/51662/diff/7/?file=1494551#file1494551line123> > > > > Prefer using immutable `IExplicit...` wrapper generated for you by the > > build. > > Karthik Anantha Padmanabhan wrote: > With the `IExplicit...` version I do not have the ability to check if the > field is set or not using `isSet...`. I use that in order to provide the > default values for the batch size. What if it's set to 0 or negative value instead? Don't you want to avoid that scenario? - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148225 --- On Sept. 8, 2016, 5:27 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > ------- > > (Updated Sept. 8, 2016, 5:27 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/admin/admin.py > 76009b9c1c7a130c25abad544a176dc590dafb12 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java > b9317dc20456f90057ec2bf4d10619a5ae986187 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/python/apache/aurora/admin/test_admin.py > eb193c4a33f5275f05a995338446bec0e19bc1cf > src/test/python/apache/aurora/client/api/test_scheduler_client.py > afac2500551af2fce406bb906aa4e33f353e90a1 > > Diff: https://reviews.apache.org/r/51662/diff/ > > > Testing > --- > > * Manually tested on my local vagrant installation > * ./build-support/jenkins/build.sh > > > Thanks, > > Karthik Anantha Padmanabhan > >
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
> On Sept. 8, 2016, 6:43 p.m., Santhosh Kumar Shanmugham wrote: > > src/test/python/apache/aurora/admin/test_admin.py, line 285 > > <https://reviews.apache.org/r/51662/diff/7/?file=1494557#file1494557line285> > > > > Should this also be assert_called_once_with(None) ? > > Karthik Anantha Padmanabhan wrote: > No - I set the option as 500 in the test and I want to test that we call > with the provided option Don't use `assert_called_with`. Use `mock_calls` instead as a generally more robust way to assert the exact number of calls. There are plenty of examples around. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148230 --- On Sept. 8, 2016, 5:27 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > --- > > (Updated Sept. 8, 2016, 5:27 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/admin/admin.py > 76009b9c1c7a130c25abad544a176dc590dafb12 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java > b9317dc20456f90057ec2bf4d10619a5ae986187 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/python/apache/aurora/admin/test_admin.py > eb193c4a33f5275f05a995338446bec0e19bc1cf > src/test/python/apache/aurora/client/api/test_scheduler_client.py > afac2500551af2fce406bb906aa4e33f353e90a1 > > Diff: https://reviews.apache.org/r/51662/diff/ > > > Testing > --- > > * Manually tested on my local vagrant installation > * ./build-support/jenkins/build.sh > > > Thanks, > > Karthik Anantha Padmanabhan > >
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148225 --- api/src/main/thrift/org/apache/aurora/gen/api.thrift (line 1215) <https://reviews.apache.org/r/51662/#comment215688> Typos in `reconciliation`. Here and in other places. Also, please use `task` in the RPC name, e.g.: `triggerExplicitTaskReconciliation`. src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java (line 123) <https://reviews.apache.org/r/51662/#comment215694> Prefer using immutable `IExplicit...` wrapper generated for you by the build. src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java (lines 124 - 125) <https://reviews.apache.org/r/51662/#comment215696> Should be formatted as: ``` int trueBatchSize = reconcilationSettings.isSetBatchSize() ? reconcilationSettings.getBatchSize() : settings.explicitBatchSize; ``` src/main/python/apache/aurora/admin/admin.py (line 345) <https://reviews.apache.org/r/51662/#comment215687> Please be explicit in the name, e.g.: `reconcile_tasks` src/main/python/apache/aurora/admin/admin.py (line 351) <https://reviews.apache.org/r/51662/#comment215699> Needs clarifcation that `reconciliation_explicit_batch_size` is a scheduler setting. - Maxim Khutornenko On Sept. 8, 2016, 5:27 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > --- > > (Updated Sept. 8, 2016, 5:27 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/admin/admin.py > 76009b9c1c7a130c25abad544a176dc590dafb12 > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java > b9317dc20456f90057ec2bf4d10619a5ae986187 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/python/apache/aurora/admin/test_admin.py > eb193c4a33f5275f05a995338446bec0e19bc1cf > src/test/python/apache/aurora/client/api/test_scheduler_client.py > afac2500551af2fce406bb906aa4e33f353e90a1 > > Diff: https://reviews.apache.org/r/51662/diff/ > > > Testing > --- > > * Manually tested on my local vagrant installation > * ./build-support/jenkins/build.sh > > > Thanks, > > Karthik Anantha Padmanabhan > >
Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51712/#review148208 --- This will most certainly break the UI. The client needs to be updated as well to pass both (old and new) arguments. Please, test in vagrant with different combinations of scheduler and client versions. - Maxim Khutornenko On Sept. 8, 2016, 12:03 a.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51712/ > --- > > (Updated Sept. 8, 2016, 12:03 a.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1764 > https://issues.apache.org/jira/browse/AURORA-1764 > > > Repository: aurora > > > Description > --- > > This extends getJobUpdateDetails to return a list of details instead of being > scoped to a single update. > > > Diffs > - > > RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java > 4202f0eec59ef16668fca6b6ebb925335ad5dea1 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > a7d1f74acdfe5f58eb822a4d826e920cad53dced > > Diff: https://reviews.apache.org/r/51712/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
> On Sept. 7, 2016, 9:45 p.m., Santhosh Kumar Shanmugham wrote: > > api/src/main/thrift/org/apache/aurora/gen/api.thrift, lines 1209-1214 > > <https://reviews.apache.org/r/51662/diff/4/?file=1493667#file1493667line1209> > > > > Did we consider combining the "explicit" and "implicit" reconciliation > > RPCs into a single "reconcile" RPC with an optional batchSize argument, so > > that it maps to the mesos master API. > > > > http://mesos.apache.org/documentation/latest/reconciliation/ > > Karthik Anantha Padmanabhan wrote: > We have more than one option on the scheduler for the explicit reconcile. > If at all we want to expose those to the admin CLI in the future it may be > better to have them as two separate API's. > > Santhosh Kumar Shanmugham wrote: > "We have more than one option on the scheduler for the explicit > reconcile." - can you given an example of what you mean by this. > > Karthik Anantha Padmanabhan wrote: > There is another releavant parameter called `explicitBatchDelaySeconds` > which dictates how long we wait in between batches before we call reconcile > on mesos. > > But it is possible that we may add another option in the future ? In that > case wouldn't it be better to have two separate API's ? > > Karthik Anantha Padmanabhan wrote: > Hmm actually if we are passing in a struct of `ExplicitReconcileSettings` > , then adding options will be easier. So I can check `isNull` on the settings > object on the scheduler to identify an implicit reconcile. > > I can go either way. I just don't know if there is a strong reason to > make it a single API as opposed to two. Please, don't save a few lines by merging the RPCs or forking off of the presence of the `batchSize`. Explicit and implicit reconciliations have very different scopes and cover different failure scenarios. I'd advocate in favor of Karthik's original proposal with separate call paths (RPCs and client commands) that clearly state their purpose in the admin API. > On Sept. 7, 2016, 9:45 p.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/aurora/admin/admin.py, line 342 > > <https://reviews.apache.org/r/51662/diff/4/?file=1493670#file1493670line342> > > > > 'default=0' here and the usage doc reads default as 1000. > > Karthik Anantha Padmanabhan wrote: > We want the default value to match the one from > settings.explicitBatchSize on the scheduler. So on the scheduler we use the > default value if the value here is 0. Although I agree that the 0 here is a > bit misleading. > > I like the optional batchSize argument for the explicit reconcile RPC. > > I am thinking there can be a > > struct ExplicitReconcileSettings { > 1. optional i32 batchSize; > } > > This can let us add more explicit reconcile settings in the future. > > Also does thrift lets us declare optional parameters in a function > (unless it is wrapped in a struct) ? > > Santhosh Kumar Shanmugham wrote: > AFAIK, Thrift does not support optional function parameters, due to the > limitations of the languages that it gets translated to. You are on the > correct path, in making this a 'struct'. Don't really see much value in overcomplicating this scenario. You can leave the `int` on the API side by have your command_option as `default=None` and traslate `None` to `0` in the client/api call. Your call. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148089 --- On Sept. 7, 2016, 9:08 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > --- > > (Updated Sept. 7, 2016, 9:08 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfa
Re: Review Request 51536: Allow watch_secs to be set to 0
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51536/#review148107 --- RELEASE-NOTES.md (line 37) <https://reviews.apache.org/r/51536/#comment215533> IMO, this statement is useless to our users and could add to confusion. Client still does not allow `watch_secs` for health check enabled updates and scheduler always allowed zeros in the `SchedulerThriftInterface.startJobUpdate` for this value. It's only this assertion that was not in sync. Suggest dropping this line and adding it into the client RB that will actually allow zero `watch_secs`. - Maxim Khutornenko On Sept. 7, 2016, 10:36 p.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51536/ > --- > > (Updated Sept. 7, 2016, 10:36 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-894 > https://issues.apache.org/jira/browse/AURORA-894 > > > Repository: aurora > > > Description > --- > > - Allow watch_secs to be set to 0. > > This feature intends to improve reliability and performance of the Aurora > scheduler job updater by relying on health check status rather than > watch_secs timeout when deciding an individual instance update state. > > See this epic: https://issues.apache.org/jira/browse/AURORA-894 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > > Diffs > - > > RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 > src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java > ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 04551f17999d742c53dfb1b36286b119b448550e > > Diff: https://reviews.apache.org/r/51536/diff/ > > > Testing > --- > > ./gradlew build > > ./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT" > > ./build-support/jenkins/build.sh > > > Thanks, > > Kai Huang > >
Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51662/#review148046 --- api/src/main/thrift/org/apache/aurora/gen/api.thrift (lines 944 - 950) <https://reviews.apache.org/r/51662/#comment215425> I don't think there is much value in having this struct and the related `reconcileStatus` RPC. Cluster operators are normally relying on scheduler stats to monitor activities like this. api/src/main/thrift/org/apache/aurora/gen/api.thrift (line 1223) <https://reviews.apache.org/r/51662/#comment215428> How about `triggerExplicitTaskReconciliation` and `triggerImplicitTaskReconciliation`? This would be self-explanatory to any API consumer. nit: please end all comments with '.' src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java (line 135) <https://reviews.apache.org/r/51662/#comment215429> Suggest defaulting to the configured `settings.explicitBatchSize` in case the provided one is 0. This will allow the correspondent admin command to make batch size completely optional. src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java (lines 135 - 143) <https://reviews.apache.org/r/51662/#comment215435> This only prevents overlapping explicit or implicit runs but does not prevent explicit and implicit overlapping runs. Also, this does not cancel any calls already waiting for a lock. Given that we deal with admin-triggered events (who are expected to know what they do), I suggest we simplify the logic here by simply scheduling an out of band run and not try to synchronize anything around reconciliation runs. This would let you cut the bulk of the changes below. src/main/python/apache/aurora/admin/admin.py (line 355) <https://reviews.apache.org/r/51662/#comment215437> The default here would be 0 if you accept my suggestion above. - Maxim Khutornenko On Sept. 6, 2016, 11:34 p.m., Karthik Anantha Padmanabhan wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51662/ > --- > > (Updated Sept. 6, 2016, 11:34 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > AURORA-1602 Aurora admin commands for reconcilation > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > > src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java > 3275d72a0a74909e635bce615e2036ec72aa38ee > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > src/main/python/apache/aurora/admin/admin.py > 76009b9c1c7a130c25abad544a176dc590dafb12 > src/main/python/apache/aurora/admin/admin_util.py > 394deb57af9ad8832a02ceab15f33b3c1e5c902b > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > > src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java > b9317dc20456f90057ec2bf4d10619a5ae986187 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/python/apache/aurora/admin/test_admin.py > eb193c4a33f5275f05a995338446bec0e19bc1cf > src/test/python/apache/aurora/client/api/test_scheduler_client.py > afac2500551af2fce406bb906aa4e33f353e90a1 > > Diff: https://reviews.apache.org/r/51662/diff/ > > > Testing > --- > > * Manually tested on my local vagrant installation > * ./build-support/jenkins/build.sh > > > Thanks, > > Karthik Anantha Padmanabhan > >
Re: Review Request 51384: Introduce UpdateMetadata fields in JobUpdateRequest.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51384/#review148035 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 7, 2016, 3:29 a.m., Santhosh Kumar Shanmugham wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51384/ > --- > > (Updated Sept. 7, 2016, 3:29 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Maxim > Khutornenko. > > > Bugs: AURORA-1711 > https://issues.apache.org/jira/browse/AURORA-1711 > > > Repository: aurora > > > Description > --- > > Allow custom metadata to be stored with the job updates. This feature > is to help case the reconciliation on job update request on the clients. > > Today when a client's requests are proxied via a middle-man, > if the middle-man times out before the scheduler responds with success, > the client is left the only option of retry. In the case of updates, > since multiple updates are not allowed in parallel, the retries fail, > with INVALID_REQUEST. Although the original update is already accepted > by the scheduler and is in-progress. > > With this feature, clients can reconcile the job updates, > - by marking updates them with a unique id in the metadata field > - scheduler returns the in-progress job update in its response > - client can match the client-generated ids to reconcile its state. > > > Diffs > - > > RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/jmh/java/org/apache/aurora/benchmark/JobUpdates.java > f4f8d0037751c9c2096747264c19f6292461b308 > src/jmh/java/org/apache/aurora/benchmark/UpdateStoreBenchmarks.java > e5228ae9baadb8cad1e5ce143df09426d6107583 > src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java > d2673e6b328cb1e249fbe91d18e0d9e935636eaa > > src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java > a3b04949f1726f110638e4f93ef45947cdb9e7f8 > > src/main/java/org/apache/aurora/scheduler/storage/db/migration/V008_CreateUpdateMetadataTable.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > 594bb6219294dcc77d48dcad14e2a6f9caa0c534 > > src/main/java/org/apache/aurora/scheduler/updater/UpdateInProgressException.java > PRE-CREATION > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/python/apache/aurora/client/cli/update.py > d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 > src/main/python/apache/aurora/client/hooks/hooked_api.py > 542f165add0f1d01a782fe6d6089bff3e736fb82 > > src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml > 8563630a179cb6d1e3b0e22c730ccbfe1c9291e2 > src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql > a40830fd668aa650c22a48cbc757b45aef27e961 > > src/test/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStoreTest.java > 08530397ff75081bde6f07f9d53317b5486e0da4 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > a7d1f74acdfe5f58eb822a4d826e920cad53dced > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 04551f17999d742c53dfb1b36286b119b448550e > src/test/python/apache/aurora/client/cli/test_supdate.py > 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 > src/test/python/apache/aurora/client/cli/util.py > aac9f9c802e4ad1f06cee8cf3f56749760b33af5 > > Diff: https://reviews.apache.org/r/51384/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > ./pants test src/test/python/apache/aurora/client/cli:cli > > > Thanks, > > Santhosh Kumar Shanmugham > >
Re: Review Request 51384: Introduce UpdateMetadata fields in JobUpdateRequest.
> On Sept. 2, 2016, 9:37 p.m., Maxim Khutornenko wrote: > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java, > > line 913 > > <https://reviews.apache.org/r/51384/diff/7/?file=1489641#file1489641line913> > > > > This will be a null ref if scheduler is called from a v-1 client that > > does not know anything about update metadata. You need to do something like > > this: > > ``` > > .setMetadata(request.isSetMetadata() > > ? toBuildersSet(request.getMetadata()) > > : ImmutableSet.of()) > > ``` > > Santhosh Kumar Shanmugham wrote: > Thrift does not expose isSetMetadata() method for optional fileds of > primitive type in the IJobUpdateRequest. It is available however in > JobUpdateRequest.java and the earlier code that converts from > JobUpdateRequest to IJobUpdateRequest does the check, via > IJobUpdateRequest.build(). > > // SchedulerThrift.java > request = IJobUpdateRequest.build(new > JobUpdateRequest(mutableRequest).setTaskConfig( > configurationManager.validateAndPopulate( > > ITaskConfig.build(mutableRequest.getTaskConfig())).newBuilder())); > > // IJobUpdateRequest.java > private IJobUpdateRequest(JobUpdateRequest wrapped) { > ... > this.metadata = wrapped.isSetMetadata() > ? FluentIterable.from(wrapped.getMetadata()) > .transform(IMetadata::build) > .toSet() > : ImmutableSet.of(); > } > > public static IJobUpdateRequest build(JobUpdateRequest wrapped) { > return new IJobUpdateRequest(wrapped); > } That's right, I was mistakenly assuming that was a `JobUpdateRequest` but that is tracked under `mutableRequest`. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51384/#review147694 --- On Sept. 7, 2016, 3:29 a.m., Santhosh Kumar Shanmugham wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51384/ > --- > > (Updated Sept. 7, 2016, 3:29 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Maxim > Khutornenko. > > > Bugs: AURORA-1711 > https://issues.apache.org/jira/browse/AURORA-1711 > > > Repository: aurora > > > Description > --- > > Allow custom metadata to be stored with the job updates. This feature > is to help case the reconciliation on job update request on the clients. > > Today when a client's requests are proxied via a middle-man, > if the middle-man times out before the scheduler responds with success, > the client is left the only option of retry. In the case of updates, > since multiple updates are not allowed in parallel, the retries fail, > with INVALID_REQUEST. Although the original update is already accepted > by the scheduler and is in-progress. > > With this feature, clients can reconcile the job updates, > - by marking updates them with a unique id in the metadata field > - scheduler returns the in-progress job update in its response > - client can match the client-generated ids to reconcile its state. > > > Diffs > - > > RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/jmh/java/org/apache/aurora/benchmark/JobUpdates.java > f4f8d0037751c9c2096747264c19f6292461b308 > src/jmh/java/org/apache/aurora/benchmark/UpdateStoreBenchmarks.java > e5228ae9baadb8cad1e5ce143df09426d6107583 > src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java > d2673e6b328cb1e249fbe91d18e0d9e935636eaa > > src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java > a3b04949f1726f110638e4f93ef45947cdb9e7f8 > > src/main/java/org/apache/aurora/scheduler/storage/db/migration/V008_CreateUpdateMetadataTable.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > 594bb6219294dcc77d48dcad14e2a6f9caa0c534 > > src/main/java/org/apache/aurora/scheduler/updater/UpdateInProgressException.java > PRE-CREATION > src/main/python/apa
Re: Review Request 51536: Scheduler updater will not use watch_sec if health check is enabled
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51536/#review147922 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 6, 2016, 6:46 p.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51536/ > --- > > (Updated Sept. 6, 2016, 6:46 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-894 > https://issues.apache.org/jira/browse/AURORA-894 > > > Repository: aurora > > > Description > --- > > - Scheduler updater will not use watch_sec if health check is enabled. > > This feature intends to improve reliability and performance of the Aurora > scheduler job updater by relying on health check status rather than > watch_secs timeout when deciding an individual instance update state. > > See this epic: https://issues.apache.org/jira/browse/AURORA-894 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > After discussion on Aurora dev list, we decided to keep the watch_secs > infrastructure intact on scheduler side. Our final conclusion is that we > adopt the following implementation: > > 1. If the users want purely health checking driven updates they can set > watch_secs to 0 and enable health checks. > > 2. If they want to have both health checking and time driven updates they can > set watch_secs to the time that they care about, and doing health checks at > STARTING state as well. > > 3. If they just want time driven updates they can disable health checking and > set watch_secs to the time that they care about. > > In this review, there will be only one scheduler change: > Currently scheduler does not accept zero value for watch_secs, we need to > relax this constraint. > > Executor change to do (in a separate review): > The executor starts health check at STARTING, if a successful health check is > performed before initial_interval_sec expires, the executor will sends a > status message for RUNNING. > > > Diffs > - > > src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java > ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 04551f17999d742c53dfb1b36286b119b448550e > > Diff: https://reviews.apache.org/r/51536/diff/ > > > Testing > --- > > ./gradlew build > > ./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT" > > ./build-support/jenkins/build.sh > > > Thanks, > > Kai Huang > >
Re: Review Request 51536: Scheduler updater will not use watch_sec if health check is enabled
> On Sept. 6, 2016, 8:56 p.m., Stephan Erb wrote: > > The current default of `watch_secs` is 45 seconds. Should we drop that to 0 > > and also adapt the docs accordingly? That would optimize the default > > settings for the health check driven updates, which I think would be a good > > thing. If we drop it to 0 now it will break many clients currently relying on it. I think we'll have to wait until the next release if we want to set a new default. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51536/#review147917 --- On Sept. 6, 2016, 6:46 p.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51536/ > --- > > (Updated Sept. 6, 2016, 6:46 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Bugs: AURORA-894 > https://issues.apache.org/jira/browse/AURORA-894 > > > Repository: aurora > > > Description > --- > > - Scheduler updater will not use watch_sec if health check is enabled. > > This feature intends to improve reliability and performance of the Aurora > scheduler job updater by relying on health check status rather than > watch_secs timeout when deciding an individual instance update state. > > See this epic: https://issues.apache.org/jira/browse/AURORA-894 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > After discussion on Aurora dev list, we decided to keep the watch_secs > infrastructure intact on scheduler side. Our final conclusion is that we > adopt the following implementation: > > 1. If the users want purely health checking driven updates they can set > watch_secs to 0 and enable health checks. > > 2. If they want to have both health checking and time driven updates they can > set watch_secs to the time that they care about, and doing health checks at > STARTING state as well. > > 3. If they just want time driven updates they can disable health checking and > set watch_secs to the time that they care about. > > In this review, there will be only one scheduler change: > Currently scheduler does not accept zero value for watch_secs, we need to > relax this constraint. > > Executor change to do (in a separate review): > The executor starts health check at STARTING, if a successful health check is > performed before initial_interval_sec expires, the executor will sends a > status message for RUNNING. > > > Diffs > - > > src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java > ac8df3e5a2da8cf22e1ba8a90944546e19ccdcaa > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 04551f17999d742c53dfb1b36286b119b448550e > > Diff: https://reviews.apache.org/r/51536/diff/ > > > Testing > --- > > ./gradlew build > > ./gradlew :test --tests "org.apache.aurora.scheduler.updater.JobUpdaterIT" > > ./build-support/jenkins/build.sh > > > Thanks, > > Kai Huang > >
Re: Review Request 51580: Add MEDIAN_TIME_TO_STARTING as a new metric
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51580/#review147853 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 5, 2016, 6:56 p.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51580/ > --- > > (Updated Sept. 5, 2016, 6:56 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > A new MTTS (Median Time To Starting) metric is added to the sla module in > addition to MTTA and MTTR. > > This review request is related to my previous review request: > https://reviews.apache.org/r/51536 > > In the new implementation, the executor starts health check at STARTING, if a > successful health check is performed before initial_interval_sec expires, it > transitions into RUNNING state. Therefore, MTTS gives us an idea of how long > it takes for a task to become active, whereas the difference between MTTR and > MTTS represents the warm-up period for a task. > > See the following issues for more backgrounds: > > https://issues.apache.org/jira/browse/AURORA-1221 > > https://issues.apache.org/jira/browse/AURORA-1222 > > The new metrics represents the median time spent waiting for a set of tasks > to reach STARTING status within a time frame(including the tasks turning into > RUNNING state within the time frame). > > Here I regard STARTING as an active state. However, STARTING state is account > for platform and job uptime calculations. > > > Diffs > - > > docs/features/sla-metrics.md 932b5dceb7e356175c7e55c75c5546ecde7ad2c4 > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java > 3ddac8b9c0adb0e2e7d02b1a741e9ff6976b3c9e > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java > 4f243aab5a2c2f86ec795025e86302a09f864e2d > src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java > 90ea3a169dadc72e7d7493544ab865ec59d4d425 > > Diff: https://reviews.apache.org/r/51580/diff/ > > > Testing > --- > > ./gradlew build > > ./gradlew :test > > ./build-support/jenkins/build.sh > > > Thanks, > > Kai Huang > >
Re: Review Request 51595: Add Job Store and Dynamic Reservations design docs
> On Sept. 2, 2016, 7:13 p.m., Zameer Manji wrote: > > AFAIK, the Job Store design doc is on hold right? > > > > Maybe we shouldn't link to it until someone works on it. +1. That design doc needs some love before we can start linking it. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51595/#review147687 --- On Sept. 2, 2016, 9:14 a.m., Stephan Erb wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51595/ > --- > > (Updated Sept. 2, 2016, 9:14 a.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > Add Job Store and Dynamic Reservations design docs > > > Diffs > - > > docs/development/design-documents.md > 50aeafbc5f96a8c443c90269730cd8b4c4bfd6d7 > > Diff: https://reviews.apache.org/r/51595/diff/ > > > Testing > --- > > > Thanks, > > Stephan Erb > >
Re: Review Request 51602: Extend the resource isolation and oversubscription documentation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51602/#review147701 --- Ship it! Ship It! - Maxim Khutornenko On Sept. 2, 2016, 2:55 p.m., Stephan Erb wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51602/ > --- > > (Updated Sept. 2, 2016, 2:55 p.m.) > > > Review request for Aurora, Maxim Khutornenko and Zameer Manji. > > > Repository: aurora > > > Description > --- > > I had to answer a couple of questions regarding these over the recent weeks > and thought it might make sense to update the docs accordingly. > > > Diffs > - > > docs/features/multitenancy.md cb45beb213eb74296194f8cad9e1aefbde07af02 > docs/features/resource-isolation.md > 59da8231a3e25e64b4be8c30a44c1043188aff2c > docs/operations/configuration.md 350ea77e30ce445ba9e37ea95a9e4f593dbbe402 > > Diff: https://reviews.apache.org/r/51602/diff/ > > > Testing > --- > > Rendered version available at > https://github.com/StephanErb/aurora/tree/isolation-docs > > > Thanks, > > Stephan Erb > >
Re: Review Request 51384: Introduce UpdateMetadata fields in JobUpdateRequest.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51384/#review147694 --- Looks great! Only a few comments left. api/src/main/thrift/org/apache/aurora/gen/api.thrift (line 774) <https://reviews.apache.org/r/51384/#comment214905> nit: all comments should be ended with '.'. Here and everywhere. src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java (line 102) <https://reviews.apache.org/r/51384/#comment214907> `checkNotBlank` is redundant here as you enter this block only if it's not empty. src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java (line 913) <https://reviews.apache.org/r/51384/#comment214908> This will be a null ref if scheduler is called from a v-1 client that does not know anything about update metadata. You need to do something like this: ``` .setMetadata(request.isSetMetadata() ? toBuildersSet(request.getMetadata()) : ImmutableSet.of()) ``` src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java (lines 932 - 934) <https://reviews.apache.org/r/51384/#comment214909> tabbing is off here src/main/python/apache/aurora/client/cli/context.py (line 79) <https://reviews.apache.org/r/51384/#comment214911> This is only used in update.py, please move it there. src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml (lines 357 - 358) <https://reviews.apache.org/r/51384/#comment214915> Instead of sub-selecting from a left join, have you considered a collection with a subselect instead? See `details.updateEvents` for example. It should be more compact and most importantly, perform better on large metadata counts. You can validate the last assumption via `./gradlew jmh -Pbenchmarks='UpdateStoreBenchmarks.JobDetailsBenchmark'` or create a new benchmark to test fetching job summaries only. - Maxim Khutornenko On Sept. 1, 2016, 5:58 a.m., Santhosh Kumar Shanmugham wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51384/ > --- > > (Updated Sept. 1, 2016, 5:58 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Maxim > Khutornenko. > > > Bugs: AURORA-1711 > https://issues.apache.org/jira/browse/AURORA-1711 > > > Repository: aurora > > > Description > --- > > Allow custom metadata to be stored with the job updates. This feature > is to help case the reconciliation on job update request on the clients. > > Today when a client's requests are proxied via a middle-man, > if the middle-man times out before the scheduler responds with success, > the client is left the only option of retry. In the case of updates, > since multiple updates are not allowed in parallel, the retries fail, > with INVALID_REQUEST. Although the original update is already accepted > by the scheduler and is in-progress. > > With this feature, clients can reconcile the job updates, > - by marking updates them with a unique id in the metadata field > - scheduler returns the in-progress job update in its response > - client can match the client-generated ids to reconcile its state. > > > Diffs > - > > RELEASE-NOTES.md d79aaadc197697d09a71c83494a01765d6a983d4 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java > d2673e6b328cb1e249fbe91d18e0d9e935636eaa > > src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java > a3b04949f1726f110638e4f93ef45947cdb9e7f8 > > src/main/java/org/apache/aurora/scheduler/storage/db/migration/V008_CreateUpdateMetadataTable.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > 594bb6219294dcc77d48dcad14e2a6f9caa0c534 > > src/main/java/org/apache/aurora/scheduler/updater/UpdateInProgressException.java > PRE-CREATION > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/python/apache/aurora/client/cli/context.py > 9cf58396dc266e97f18f761d110e564ac02bb15d > src/main/python/apache/aurora/client/cli/update.py > d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 > src/main/python/
Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled
> On Sept. 1, 2016, 7:51 p.m., Zameer Manji wrote: > > api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 467 > > <https://reviews.apache.org/r/51536/diff/3/?file=1489406#file1489406line467> > > > > This is not sufficent to determine if healthchecking occurs. > > > > We now support shell healthchecking, so a job may not have any port > > named health but it will still have it's health checked by thermos. > > > > Why can't enabling of this feature be a property of the Job or Update? > > Kai Huang wrote: > We can infer if the health check is enabled from the Job Config, so I > think rather than modifying the Job, we should modify the Update to direct > the Instance Updater to skip watch_secs(e.g. creating a boolean named > "isWatchSecsSkippable" in InstanceUpdater). > > Kai Huang wrote: > A second thought which seems to be a better solution: We just modify the > executor to send a message in its healthChecker thread, and on scheduler side > we check the presence of the task status update message string to determine > if watch_secs should be skipped. > > There is no need to check if health check is enabled at scheduler side > nor modifying the Update. > > The watch_secs is skipped by the vCurrent scheduler if and only if the > vCurrent executor is doing a health check, and sends an non-empty message > string during the TASK_RUNNING state transition. I think Kai's latest proposal should be sufficient to determine what policy to apply. Iff RUNNING gets back with our "special" message added, the updater will skip watch_secs. In all other cases, it will honor watch_secs. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51536/#review147597 --- On Aug. 31, 2016, 10:08 p.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51536/ > --- > > (Updated Aug. 31, 2016, 10:08 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > - Scheduler updater will not use watch_sec if health check is enabled > - Add unit tests for successful updates. > > This feature intends to improve reliability and performance of the Aurora > scheduler job updater by relying on health check status rather than > watch_secs timeout when deciding an individual instance update state. > > See this epic: https://issues.apache.org/jira/browse/AURORA-894 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# > for more details and background. > > > Assumptions on executor change: > The executor starts health check at STARTING, if a successful health check is > performed before initial_interval_sec expires, the executor will sends a > status message for RUNNING. > > What I try to implement: > vCurrent updater will infer the executor version from the presence of a > status message sent by the executor during RUNNING status update. > > For vPrev executor: > The vCurrent updater will use watch_sec for instance update, this is > independent from the health check. > For vCurrent executor: > If health check is disabled on vCurrent executor, the vCurrent updater > will use watch_sec for instance update. > If health check is enabled on vCurrent executor, the vCurrent updater > will not use watch_sec for instance update. > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java > c129896d8cd54abd2634e2a339c27921042b0162 > src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java > c78c7fbd7d600586136863c99ce3d7387895efee > > Diff: https://reviews.apache.org/r/51536/diff/ > > > Testing > --- > > ./gradlew build > > ./gradlew :test --tests > "org.apache.aurora.scheduler.updater.InstanceUpdaterTest" > > ./build-support/jenkins/build.sh > > > Thanks, > > Kai Huang > >
Re: Review Request 51536: @ReviewBot retry Scheduler updater will not use watch_sec if health check is enabled
> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, > > line 143 > > <https://reviews.apache.org/r/51536/diff/1/?file=1488791#file1488791line143> > > > > We should add a constant to api.thrift for `health` and use it here as > > well as in the executor. > > Stephan Erb wrote: > We need to look for a more general solution here. The presence of a > health port does not account for the potential existence of a > `.healthchecksnooze` file disabling health checks. > > Kai Huang wrote: > Thanks, that's a good point. We could combine this logic into "executor > capabilities" as Joshua mentioned. I disagree that we should account for the `.healthchecksnooze` for this feature. This approach has always been considered extremely risky and "proceed at your own risk" way to debug end user issues. If someone drops a `.healthchecksnooze` _after_ task entring `STARTING` and _before_ it reaches `RUNNING` I believe the following should be enough to address such case (from design doc): > 2. Instance fails to respond OK and the initial_interval_secs expires ? task > moves into FAILED. > On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, > > line 130 > > <https://reviews.apache.org/r/51536/diff/1/?file=1488791#file1488791line130> > > > > Doing this based simply on the presence of a string feels brittle to > > me. Would it make sense to define some basic form of "executor > > capabilities" passing? Even if it's something simple like a comma delimited > > list of constants that we define in api.thrift (so that both the scheduler > > and the executor can reuse the values)? > > Kai Huang wrote: > Do you mean adding executor capabilities as one attribute of TaskEvent in > api.thrift? > > I think on the executor side , we can put (isHealthCheckEnabled() && > isExecutorUpdated()) into one attribute called "watch_sec_skippable", and > send it to the scheduler. > > Zameer Manji wrote: > "I think on the executor side , we can put (isHealthCheckEnabled() && > isExecutorUpdated()) into one attribute called "watch_sec_skippable", and > send it to the scheduler." > > We currently have no communication via the executor and scheduler outside > of status updates. If you want to do this, please discuss this idea on the > mailing list. The design doc makes no reference to executor capabilities or > any communication between the executor and scheduler. Joshua, how did you see the "executor capabilities" working here? As Zameer pointed out, there is currently no way for us to communicate real-time executor capabilities to the scheduler. I don't think adding this type of info to every status update justifies the amount of work and additional storage space that would require. Besides, we'd have to deprecate this capability and remove from the schema in the next release or two. I agree relying on the presence of the message may appear brittle at first but as far as I can see it's the most straightforward approach that will not require any special handling or deprecation followups. We are in full control of the executor and it's guaranteed through code the message will remain absent for any V-n versions running out there. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51536/#review147358 --- On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51536/ > --- > > (Updated Aug. 30, 2016, 8:52 p.m.) > > > Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji. > > > Repository: aurora > > > Description > --- > > - Scheduler updater will not use watch_sec if health check is enabled > - Add unit tests for successful updates. > > This feature intends to improve reliability and performance of the Aurora > scheduler job updater by relying on health check status rather than > watch_secs timeout when deciding an individual instance update state. > > See this epic: https://issues.apache.org/jira/browse/AURORA-894 > and the design doc: > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# &
Re: Review Request 51535: Fix a Python unittest that is not asserting anything
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51535/#review147346 --- Ship it! Good catch! - Maxim Khutornenko On Aug. 30, 2016, 7:15 p.m., Stephan Erb wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51535/ > --- > > (Updated Aug. 30, 2016, 7:15 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > I discovered this one during a failed attempt to update to a new mock > version. I have only aimed for a minimal fix: > > * incorrect usage of PropertyMock: > https://docs.python.org/dev/library/unittest.mock.html#unittest.mock.PropertyMock > > * assert_called_once() does not exist: > https://engineeringblog.yelp.com/2015/02/assert_called_once-threat-or-menace.html > > > Diffs > - > > src/test/python/apache/aurora/admin/test_admin.py > fd3b91938b5b67ea59d3955dd94a046b3181be2d > > Diff: https://reviews.apache.org/r/51535/diff/ > > > Testing > --- > > Made sure it fails if the production method is tempered with. > ./pants test.pytest src/test/python/apache/aurora/admin:: > > > Thanks, > > Stephan Erb > >
Re: Review Request 51506: Enable -zk_use_curator by default and deprecate.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51506/#review147326 --- src/main/java/org/apache/aurora/scheduler/discovery/FlaggedZooKeeperConfig.java (line 78) <https://reviews.apache.org/r/51506/#comment214449> I suggest against placing a version number into the warning message. We've had a few past examples when we did not follow up on deprecation warnings as promised. - Maxim Khutornenko On Aug. 29, 2016, 11:37 p.m., John Sirois wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51506/ > --- > > (Updated Aug. 29, 2016, 11:37 p.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Bugs: AURORA-1669 > https://issues.apache.org/jira/browse/AURORA-1669 > > > Repository: aurora > > > Description > --- > > The flag is noted as deprecated for removal in 0.17.0. > > RELEASE-NOTES.md >| 2 ++ > docs/reference/scheduler-configuration.md >| 4 ++-- > examples/vagrant/upstart/aurora-scheduler.conf >| 1 - > > src/main/java/org/apache/aurora/scheduler/discovery/FlaggedZooKeeperConfig.java > | 11 +-- > 4 files changed, 13 insertions(+), 5 deletions(-) > > > Diffs > - > > RELEASE-NOTES.md 1819eaa20cf5014228643a1e120316d646cc2824 > docs/reference/scheduler-configuration.md > c0c39442725c970e5f9811cd7b4ab3104364c671 > examples/vagrant/upstart/aurora-scheduler.conf > dd60981564286e5b25546737fdd0ce08b0168ed4 > > src/main/java/org/apache/aurora/scheduler/discovery/FlaggedZooKeeperConfig.java > 36ad18c49c5693031136440ab163070f9ffa9405 > > Diff: https://reviews.apache.org/r/51506/diff/ > > > Testing > --- > > Locally green: > ``` > ./gradlew -Pq build > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > ``` > > > Thanks, > > John Sirois > >
Re: Review Request 51307: Catch IOError.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51307/#review147179 --- Ship it! Ship It! - Maxim Khutornenko On Aug. 23, 2016, 9:12 p.m., David Robinson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51307/ > --- > > (Updated Aug. 23, 2016, 9:12 p.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1752 > https://issues.apache.org/jira/browse/AURORA-1752 > > > Repository: aurora > > > Description > --- > > Catch IOError. > > > Diffs > - > > src/main/python/apache/thermos/monitoring/process_collector_psutil.py > bb7c90206791309772c4bb8e2ccf6e62a3991403 > src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py > PRE-CREATION > > Diff: https://reviews.apache.org/r/51307/diff/ > > > Testing > --- > > $ ./pants test src/test/python/apache/thermos/monitoring > > 13:59:36 00:00 [main] >(To run a reporting server: ./pants server) > 13:59:36 00:00 [setup] > 13:59:36 00:00 [parse] >Executing tasks in goals: test > 13:59:36 00:00 [test] > 13:59:36 00:00 [test-prep-command] > 13:59:36 00:00 [test] > 13:59:36 00:00 [pytest] > 13:59:36 00:00 [run] > == test session starts === > platform linux2 -- Python 2.7.11 -- py-1.4.31 -- > pytest-2.6.4 > plugins: cov, timeout > collected 12 items > > src/test/python/apache/thermos/monitoring/test_disk.py . > > src/test/python/apache/thermos/monitoring/test_detector.py . > > src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py . > > src/test/python/apache/thermos/monitoring/test_resource.py . > > === 12 passed in 0.17 seconds > > 13:59:37 00:01 [complete] >SUCCESS > > > Thanks, > > David Robinson > >
Re: Review Request 51469: Remove static Stats method `exportSize`.
> On Aug. 29, 2016, 4:33 p.m., Maxim Khutornenko wrote: > > Not against this change but rather curious what your thoughts are wrt the > > larger picture here. Are you intending to get rid of all static helps in > > `Stats` in favor of the `StatsProvider`? > > Zameer Manji wrote: > Yes, I think the static dependency on stats is tech debt that we need to > fix. There is also lots of unused code in Stats that I will be removing as > well. Not sure I agree about eliminating all of `Stats` static helpers. Having methods like `exportLong` have proven to help us reduce friction around adding new stats and generally help quick prototyping and perf investigations. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51469/#review147138 --- On Aug. 26, 2016, 10:11 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51469/ > --- > > (Updated Aug. 26, 2016, 10:11 p.m.) > > > Review request for Aurora and Joshua Cohen. > > > Repository: aurora > > > Description > --- > > Remove static Stats method `exportSize`. > > > Diffs > - > > commons/src/main/java/org/apache/aurora/common/stats/Stats.java > 538e8077ba762ec9a8f14f759740bb82144a5dc4 > commons/src/main/java/org/apache/aurora/common/stats/StatsProvider.java > 6b1fa4b35c54ab3e37a80c91664c6e8cfd4696c2 > src/main/java/org/apache/aurora/scheduler/TaskStatusHandlerImpl.java > a83f18307fd8e1d266f3d3ae8682333575eda884 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > a1ae7c7231d15d2f9715a7de30dfcde0a2f4d730 > src/test/java/org/apache/aurora/scheduler/TaskStatusHandlerImplTest.java > 6d4934b6b6de5273687041174315c4e3d151ac11 > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > e7534c4325afa29b580d890cffdc26bdec7ef938 > > Diff: https://reviews.apache.org/r/51469/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51384: Introduce UpdateMetadata fields in JobUpdateRequest.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51384/#review147146 --- api/src/main/thrift/org/apache/aurora/gen/api.thrift (lines 795 - 796) <https://reviews.apache.org/r/51384/#comment214251> How about moving this into `JobUpdateSummary` instead? As it stands now, the access to update metadata requires exposing a potentially much heavier `JobUpdate` struct through our thrift interface. Bonus point: if you move it into the `JobUpdateSummary` the metadata will be automatically exposed via the existing `getJobUpdateSummaries` RPC. - Maxim Khutornenko On Aug. 28, 2016, 3:58 a.m., Santhosh Kumar Shanmugham wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51384/ > --- > > (Updated Aug. 28, 2016, 3:58 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Maxim > Khutornenko. > > > Bugs: AURORA-1711 > https://issues.apache.org/jira/browse/AURORA-1711 > > > Repository: aurora > > > Description > --- > > Allow custom metadata to be stored with the job updates. This feature > is to help case the reconciliation on job update request on the clients. > > Today when a client's requests are proxied via a middle-man, > if the middle-man times out before the scheduler responds with success, > the client is left the only option of retry. In the case of updates, > since multiple updates are not allowed in parallel, the retries fail, > with INVALID_REQUEST. Although the original update is already accepted > by the scheduler and is in-progress. > > With this feature, clients can reconcile the job updates, > - by marking updates them with a unique id in the metadata field > - scheduler returns the in-progress job update in its response > - client can match the client-generated ids to reconcile its state. > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > c5765b70501c101f0535b4eed94e9948c36808f9 > src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java > d2673e6b328cb1e249fbe91d18e0d9e935636eaa > > src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java > a3b04949f1726f110638e4f93ef45947cdb9e7f8 > > src/main/java/org/apache/aurora/scheduler/storage/db/migration/V008_CreateUpdateMetadataTable.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/db/views/DbJobUpdate.java > 78703e92c932cd5e93ff0b70f2a0b6928f6b4003 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 929d021a336c6a3438613c9340c84a1096dc9069 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > 594bb6219294dcc77d48dcad14e2a6f9caa0c534 > > src/main/java/org/apache/aurora/scheduler/updater/UpdateInProgressException.java > PRE-CREATION > src/main/python/apache/aurora/client/api/__init__.py > 9149c3018ae58d405f284fcbd4076d251ccc8192 > src/main/python/apache/aurora/client/cli/context.py > f1a256a8d09d23d8d4d4ee7d264be0fe376398c4 > src/main/python/apache/aurora/client/cli/options.py > 1245ff15a69a4b4347672f7b556985521e813a00 > src/main/python/apache/aurora/client/cli/update.py > 23aaa2c1b67599420408633733e4581553f7151b > src/main/python/apache/aurora/client/hooks/hooked_api.py > 542f165add0f1d01a782fe6d6089bff3e736fb82 > > src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml > 8563630a179cb6d1e3b0e22c730ccbfe1c9291e2 > src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql > a40830fd668aa650c22a48cbc757b45aef27e961 > > src/test/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStoreTest.java > 08530397ff75081bde6f07f9d53317b5486e0da4 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 779dc302602ae8842084807ca868a491ea99b676 > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 04551f17999d742c53dfb1b36286b119b448550e > src/test/python/apache/aurora/client/cli/test_supdate.py > 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 > src/test/python/apache/aurora/client/cli/util.py > aac9f9c802e4ad1f06cee8cf3f56749760b33af5 > > Diff: https://reviews.apache.org/r/51384/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > ./pants test src/test/python/apache/aurora/client/cli:cli > > > Thanks, > > Santhosh Kumar Shanmugham > >
Re: Review Request 51469: Remove static Stats method `exportSize`.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51469/#review147138 --- Not against this change but rather curious what your thoughts are wrt the larger picture here. Are you intending to get rid of all static helps in `Stats` in favor of the `StatsProvider`? - Maxim Khutornenko On Aug. 26, 2016, 10:11 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51469/ > --- > > (Updated Aug. 26, 2016, 10:11 p.m.) > > > Review request for Aurora and Joshua Cohen. > > > Repository: aurora > > > Description > --- > > Remove static Stats method `exportSize`. > > > Diffs > - > > commons/src/main/java/org/apache/aurora/common/stats/Stats.java > 538e8077ba762ec9a8f14f759740bb82144a5dc4 > commons/src/main/java/org/apache/aurora/common/stats/StatsProvider.java > 6b1fa4b35c54ab3e37a80c91664c6e8cfd4696c2 > src/main/java/org/apache/aurora/scheduler/TaskStatusHandlerImpl.java > a83f18307fd8e1d266f3d3ae8682333575eda884 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > a1ae7c7231d15d2f9715a7de30dfcde0a2f4d730 > src/test/java/org/apache/aurora/scheduler/TaskStatusHandlerImplTest.java > 6d4934b6b6de5273687041174315c4e3d151ac11 > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > e7534c4325afa29b580d890cffdc26bdec7ef938 > > Diff: https://reviews.apache.org/r/51469/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 51474: Uber
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51474/#review147136 --- Ship it! Ship It! - Maxim Khutornenko On Aug. 27, 2016, 5:53 a.m., Tarun Gogineni wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51474/ > --- > > (Updated Aug. 27, 2016, 5:53 a.m.) > > > Review request for Aurora and Joshua Cohen. > > > Repository: aurora > > > Description > --- > > Added Uber to the relevant user list > > > Diffs > - > > README.md 77fb77063eb96da9574794f989641c7746d2c35c > > Diff: https://reviews.apache.org/r/51474/diff/ > > > Testing > --- > > Their latest blog post regarding their engineering stack mentioned their use > of aurora for long running jobs. > > > Thanks, > > Tarun Gogineni > >
Re: Review Request 51307: Catch IOError.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51307/#review146459 --- Thanks for fixing this! A very minor unit test would be great here. - Maxim Khutornenko On Aug. 23, 2016, 12:04 a.m., David Robinson wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51307/ > --- > > (Updated Aug. 23, 2016, 12:04 a.m.) > > > Review request for Aurora, Joshua Cohen and Maxim Khutornenko. > > > Bugs: AURORA-1752 > https://issues.apache.org/jira/browse/AURORA-1752 > > > Repository: aurora > > > Description > --- > > Catch IOError. > > > Diffs > - > > src/main/python/apache/thermos/monitoring/process_collector_psutil.py > bb7c90206791309772c4bb8e2ccf6e62a3991403 > > Diff: https://reviews.apache.org/r/51307/diff/ > > > Testing > --- > > > Thanks, > > David Robinson > >
Re: Review Request 51298: A few executor fixes for filesystem isolation:
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51298/#review146441 --- Ship it! Ship It! - Maxim Khutornenko On Aug. 22, 2016, 8:25 p.m., Joshua Cohen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51298/ > --- > > (Updated Aug. 22, 2016, 8:25 p.m.) > > > Review request for Aurora and Maxim Khutornenko. > > > Repository: aurora > > > Description > --- > > - Add an option to skip the groupadd/useradd calls into the task's filesystem. > - Mount any configured volumes into the task's filesystem. > - Clean up http server script used by appc e2e tests. > > > Diffs > - > > src/main/python/apache/aurora/executor/aurora_executor.py > dde19a6914c7c7b2178220707f242f61f11f38bd > src/main/python/apache/aurora/executor/bin/thermos_executor_main.py > 65a495d5c50d91b38c4328bab3bfec667f6a7ba9 > src/main/python/apache/aurora/executor/common/sandbox.py > 5f091af7636bd94f028f15d63437e305b02f741c > src/test/python/apache/aurora/executor/common/test_sandbox.py > ce989b1ccda0f1bc7ba9e15dfe4be20116db3491 > src/test/python/apache/aurora/executor/test_thermos_executor.py > 06601df3bc355a690ff1789b2e2e34484fadefe9 > src/test/sh/org/apache/aurora/e2e/run-server.sh > 76939888bed2e8138671d97f7bc56fd5641008e4 > > Diff: https://reviews.apache.org/r/51298/diff/ > > > Testing > --- > > ./build-support/jenkins/build.sh > e2e tests > > > Thanks, > > Joshua Cohen > >
Re: Review Request 51107: Thrift API change to support instance-specific task config in JobConfiguration
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51107/#review145805 --- -1 to this change. Please, see here for reasons behind it: http://markmail.org/message/d4j6lkncqkua5a23 - Maxim Khutornenko On Aug. 15, 2016, 9:11 p.m., Min Cai wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51107/ > --- > > (Updated Aug. 15, 2016, 9:11 p.m.) > > > Review request for Aurora. > > > Repository: aurora > > > Description > --- > > Implement instance-specific task config in JobConfiguration > > > Diffs > - > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > b799cce665c99ee20fba84a9ddcb5a895ff5685e > > src/main/java/org/apache/aurora/scheduler/configuration/SanitizedConfiguration.java > 264cb7cb38809a3a37b13b76e93f423ec03fbacc > src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java > c07551e94f9221b5b21c5dc9715e82caa290c2e8 > src/main/java/org/apache/aurora/scheduler/state/StateManager.java > d395104af84477b03649ddd65276fbd1dc1e9210 > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java > ffa9481fa4d2f5e1cf74f5819b398e3cb74f1010 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > b534abf95bab6e1657e3ef993cf34c0d6ec460be > > src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java > 11ffa663e4e0fa5fda0ebb343d11d2485a83c7c6 > > Diff: https://reviews.apache.org/r/51107/diff/ > > > Testing > --- > > > Thanks, > > Min Cai > >