----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33689/ -----------------------------------------------------------
(Updated May 11, 2015, 6:55 p.m.) Review request for Aurora, Maxim Khutornenko and Bill Farner. Changes ------- Maxim's feedback. Bugs: AURORA-1228 https://issues.apache.org/jira/browse/AURORA-1228 Repository: aurora Description ------- Now the processing of status updates is done asynchronously with batching to insulate throughput from the expensive storage resource. Updates are placed into a queue and consumed by another thread. If many updates arrive while we're storing a batch of updates, these will be processed together in batch rather than individually. Diffs (updated) ----- src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 7bb64dd913f0fe2fede95d50a061043dbb794ab4 src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java 45de15a57baf7a2f7d437b590935714e28777f35 src/main/java/org/apache/aurora/scheduler/SchedulerModule.java d3ac176e9402a33fd2074b0737313458120da9e2 src/main/java/org/apache/aurora/scheduler/UserTaskLauncher.java 0ce9c9d4cf75f9add260f285115b1d60786ded57 src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 4d589a33a2933b0cb6caf85abfae45c5e635c3ce src/main/java/org/apache/aurora/scheduler/mesos/Driver.java c7e45a89ceaa2c310feb610091eec0b04187860e src/main/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImpl.java 9b8ab7c1027731f9d3f6cae77b85272ea63354d4 src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java da2d5df2e053e6e1b8fb08d6813dff9eac9777f8 src/test/java/org/apache/aurora/scheduler/UserTaskLauncherTest.java 32432322753799562d671db39c0d7fa308d962ff src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 422d5a9a42310979752eb7282658316c2b772419 src/test/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImplTest.java abdeee49858fc439c27911c4eb544bf8e8c931d4 Diff: https://reviews.apache.org/r/33689/diff/ Testing ------- Ran the benchmark to confirm that this improves status update throughput substantially: Before: Around 100 updates per second for a 5ms storage latency. Much worse for higher latencies. After: Around 4k-5k updates per second for a 5ms storage latency, down to 3k updates per second for 100ms storage latency. Updated unit tests for the new invariants: * TaskLaunchers are responsible for acknowledging updates. * UserTaskLauncher processes updates asynchronously. Thanks, Ben Mahler