> On Sept. 12, 2016, 1:18 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 115-128
> > <https://reviews.apache.org/r/51765/diff/1/?file=1495174#file1495174line115>
> >
> >     The `scheduleTask` method is doing some staggering amount of work, such 
> > as iterating over offers and  matching constraints. Only a small part of 
> > this actually requires an open write transaction.
> >     
> >     It seems to me as if moving part of this computation out of the write 
> > transaction could significantly improve write throughput. You have probably 
> > considered that, so what am I missing here? :-)
> 
> Maxim Khutornenko wrote:
>     It's true that only a small part requires writing into the store. 
> Unfortunately, we do use write transaction as a global system lock to freeze 
> the state for the duration of the logical change. It's very hard to reason 
> about that logic if things change underneath. The assumptions about being 
> inside of a "locked" state begin with computing the `AttributeAggregate` in 
> the `TaskScheduler` and follow along until we are ready to launch task or 
> consider a preemption. There are MANY things that can go wrong if we move the 
> transaction boundary. Starting from trivial nullref and concurrent collection 
> modification exceptions to very hard to troubleshoot logical errors (offers 
> reused, limit constraints are not enforced and etc.). While it's 
> theoretically possible to implement an optimistic task/offer matching it 
> would require substantial rewrite of what we currently have and is definitely 
> outside the scope of this change.

Thanks for the clarification!


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/#review148434
-----------------------------------------------------------


On Sept. 9, 2016, 8:52 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51765/
> -----------------------------------------------------------
> 
> (Updated Sept. 9, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the final part of the `BatchWorker` conversion work that converts 
> `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background 
> on the `BatchWorker`.
> 
> #####Problem
> See https://reviews.apache.org/r/51759
> 
> #####Remediation
> Task scheduling is one of the most dominant users of the write lock. It's 
> also one of the heaviest and the most latency-sensitive. As such, the default 
> max batch size is chosen conservatively low (3) and batch items are executed 
> in a blocking way. 
> 
> BTW, attempting to make task scheduling non-blocking resulted in a much worse 
> scheduling performance. The way our `DBTaskStore` is wired, all async 
> activities, including `EventBus` are bound to use a single async `Executor`, 
> which is currently limited at 8 threads [1]. Relying on the same `EventBus` 
> to deliver scheduling completion events resulted in slower scheduling perf as 
> those events were backed up behind all other activities, including tasks 
> status events, reconciliation and etc. Increasing the executor thread pool 
> size to a larger number on the other side, also increased the lock contention 
> defeating the whole purpose of this work.
> 
> #####Results
> See https://reviews.apache.org/r/51759 for the lock contention results.
> 
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 11e8033438ad0808e446e41bb26b3fa4c04136c7 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> c044ebe6f72183a67462bbd8e5be983eb592c3e9 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> d266f6a25ae2360db2977c43768a19b1f1efe8ff 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c2ceb4e7685a9301f8014a9183e02fbad65bca26 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  72562e6bd9a9860c834e6a9faa094c28600a8fed 
> 
> Diff: https://reviews.apache.org/r/51765/diff/
> 
> 
> Testing
> -------
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to