> On Sept. 13, 2016, 6:29 p.m., Zameer Manji wrote: > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 110 > > <https://reviews.apache.org/r/51759/diff/4/?file=1498117#file1498117line110> > > > > The documentation implies we are returning a boolean but we are > > returning a `Result` object. I think you need to update that. I think what > > you are saying is that the `Result` object signals through `isCompleted` > > that the work should be repeated right? > > > > (As an aside, maybe we should name this to `shouldReschedule` or > > similar). > > > > > > If work needs to be repeated once the result has been computed, why > > can't the caller just access the `BatchWorker` in the `Work<T>` it submits > > and do that itself? > > > > That is why can't we extend `apply` to also accept a `BatchWorker` and > > the caller can submit another task within the `Work<T>` if that's required? > > > > Alternatively now that `execute` returns a `CompletableFuture`, the > > caller could use one of the many methods to execute code when the future is > > complete and have that extra code add more work. > > Maxim Khutornenko wrote: > Fixed the javadoc and converted `NoResult` into a static field to > simplify callsite consumption. > > As for the second suggestion, I don't like disseminating `BatchWorkers` > or work item state maintenance outside the `BatchWorker` itself. That would > put the burden of maintaining and passing around the state to the callsite > and may create hard to reason/troubleshoot use cases. Consider the > https://reviews.apache.org/r/51763. We would have to use an explicit wrapper > object to hold on to local data to be able to re-queue it again as the > trigger context is gone the moment `BatchWorker` call is placed. Also, > queueing recurrent homogenuous events via `CompletableStage` chaining does > not strike me as the right use of the functional style.
Understood. I spent some time about an alternative and I didn't get one that seemed to work well so I'll accept this design. Most solutions I looked that involved recursive closures. They would look like `queue work` -> `add call back on the future` -> `call back calls back into queuing work`. I think that's pretty messy and something that could be touched upon later. - Zameer ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51759/#review148837 ----------------------------------------------------------- On Sept. 14, 2016, 10:25 a.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51759/ > ----------------------------------------------------------- > > (Updated Sept. 14, 2016, 10:25 a.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > This is the first (out of 3) patches intending to reduce storage write lock > contention and as such improve overall system write throughput. It introduces > the `BatchWorker` and migrates the majority of storage writes due to task > status change events to use `TaskEventBatchWorker`. > > #####Problem > Our current storage system writes effectively behave as `SERIALIZABLE` > transaction isolation level in SQL terms. This means all writes require > exclusive access to the storage and no two transactions can happen in > parallel . While it certainly simplifies our implementation, it creates a > single hotspot where multiple threads are competing for the storage write > access. This type of contention only worsens as the cluster size grows, more > tasks are scheduled, more status updates are processed, more subscribers are > listening to status updates and etc. Eventually, the scheduler throughput > (and especially task scheduling) becomes degraded to the extent that certain > operations wait much longer (4x and more) for the lock acquisition than it > takes to process their payload when inside the transaction. Some ops (like > event processing) are generally tolerant of these types of delays. Others - > not as much. The task scheduling suffers the most as backing up the > scheduling queue directly affects the Median Time To Assigned (MTTA). > > #####Remediation > Given the above, it's natural to assume that reducing the number of write > transactions should help reducing the lock contention. This patch introduces > a generic `BatchWorker` service that delivers a "best effort" batching > approach by redirecting multiple individual write requests into a single FIFO > queue served non-stop by a single dedicated thread. Every batch shares a > single write transaction thus reducing the number of potential write lock > requests. To minimize wait-in-queue time, items are dispatched immediately > and the max number of items is bounded. There are a few `BatchWorker` > instances specialized on particular workload types: task even processing, > cron scheduling and task scheduling. Every instance can be tuned > independently (max batch size) and provides specialized metrics helping to > monitor each workload type perf. > > #####Results > The proposed approach has been heavily tested in production and delivered the > best results. The lock contention latencies got down between 2x and 5x > depending on the cluster load. A number of other approaches tried but > discarded as not performing well or even performing much worse than the > current master: > - Clock-driven batch execution - every batch is dispatched on a time schedule > - Max batch with a deadline - a batch is dispatched when max size is reached > OR a timeout expires > - Various combinations of the above - some `BatchWorkers` are using > clock-driven execution while others are using max batch with a deadline > - Completely non-blocking (event-based) completion notification - all call > sites are notified of item completion via a `BatchWorkCompleted` event > > Happy to provide more details on the above if interested. > > #####Upcoming > The introduction of the `BatchWorker` by itself was not enough to > substantially improve the MTTA. It, however, paves the way for the next phase > of scheduling perf improvement - taking more than 1 task from a given > `TaskGroup` in a single scheduling round (coming soon). That improvement > wouldn't deliver without decreasing the lock contention first. > > Note: it wasn't easy to have a clean diff split, so some functionality in > `BatchWorker` (e.g.: `executeWithReplay`) appears to be unused in the current > patch but will become obvious in the part 2 (coming out shortly). > >  - > https://github.com/apache/aurora/blob/f6ac13b169aaad5aad73ef3cc72873781e30a705/src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java#L540-L555 > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/BatchWorker.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/SchedulerModule.java > 4a7ef0b1b90607f68d89fe8e207f42c42a8c56a0 > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java > f07746c2b990c1c2235e99f9e4775fc84f9c27b1 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskThrottler.java > bbd971a2aa8a96cf79edd879ad60e1bebd933d79 > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > 3c7cda09ab292d696070ca4d9dfedb1f6f71b0fe > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > 594bb6219294dcc77d48dcad14e2a6f9caa0c534 > src/test/java/org/apache/aurora/scheduler/BatchWorkerTest.java PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java > 99c27e8012f10a67ce5f1b84d258e7a5608995c7 > src/test/java/org/apache/aurora/scheduler/scheduling/TaskThrottlerTest.java > 7d104aa2ea4a4d99be4711f666d18beca238284e > > src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java > 94f5ca565476f62d72879837a0e7dafabcf30432 > src/test/java/org/apache/aurora/scheduler/testing/BatchWorkerUtil.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 196df4754b553f05e50b66ad2f84271901bc9eba > > Diff: https://reviews.apache.org/r/51759/diff/ > > > Testing > ------- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >