> On Sept. 11, 2016, 11:18 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 115-128
> > <https://reviews.apache.org/r/51765/diff/1/?file=1495174#file1495174line115>
> >
> >     The `scheduleTask` method is doing some staggering amount of work, such 
> > as iterating over offers and  matching constraints. Only a small part of 
> > this actually requires an open write transaction.
> >     
> >     It seems to me as if moving part of this computation out of the write 
> > transaction could significantly improve write throughput. You have probably 
> > considered that, so what am I missing here? :-)

It's true that only a small part requires writing into the store. 
Unfortunately, we do use write transaction as a global system lock to freeze 
the state for the duration of the logical change. It's very hard to reason 
about that logic if things change underneath. The assumptions about being 
inside of a "locked" state begin with computing the `AttributeAggregate` in the 
`TaskScheduler` and follow along until we are ready to launch task or consider 
a preemption. There are MANY things that can go wrong if we move the 
transaction boundary. Starting from trivial nullref and concurrent collection 
modification exceptions to very hard to troubleshoot logical errors (offers 
reused, limit constraints are not enforced and etc.). While it's theoretically 
possible to implement an optimistic task/offer matching it would require 
substantial rewrite of what we currently have and is definitely outside the 
scope of this change.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/#review148434
-----------------------------------------------------------


On Sept. 9, 2016, 6:52 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51765/
> -----------------------------------------------------------
> 
> (Updated Sept. 9, 2016, 6:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the final part of the `BatchWorker` conversion work that converts 
> `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background 
> on the `BatchWorker`.
> 
> #####Problem
> See https://reviews.apache.org/r/51759
> 
> #####Remediation
> Task scheduling is one of the most dominant users of the write lock. It's 
> also one of the heaviest and the most latency-sensitive. As such, the default 
> max batch size is chosen conservatively low (3) and batch items are executed 
> in a blocking way. 
> 
> BTW, attempting to make task scheduling non-blocking resulted in a much worse 
> scheduling performance. The way our `DBTaskStore` is wired, all async 
> activities, including `EventBus` are bound to use a single async `Executor`, 
> which is currently limited at 8 threads [1]. Relying on the same `EventBus` 
> to deliver scheduling completion events resulted in slower scheduling perf as 
> those events were backed up behind all other activities, including tasks 
> status events, reconciliation and etc. Increasing the executor thread pool 
> size to a larger number on the other side, also increased the lock contention 
> defeating the whole purpose of this work.
> 
> #####Results
> See https://reviews.apache.org/r/51759 for the lock contention results.
> 
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 11e8033438ad0808e446e41bb26b3fa4c04136c7 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> c044ebe6f72183a67462bbd8e5be983eb592c3e9 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> d266f6a25ae2360db2977c43768a19b1f1efe8ff 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c2ceb4e7685a9301f8014a9183e02fbad65bca26 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  72562e6bd9a9860c834e6a9faa094c28600a8fed 
> 
> Diff: https://reviews.apache.org/r/51765/diff/
> 
> 
> Testing
> -------
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to