-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59030/#review174086
-----------------------------------------------------------



@ReviewBot retry

- Mehrdad Nurolahzade


On May 5, 2017, 2:36 p.m., Mehrdad Nurolahzade wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59030/
> -----------------------------------------------------------
> 
> (Updated May 5, 2017, 2:36 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1869
>     https://issues.apache.org/jira/browse/AURORA-1869
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> `TaskStatusHandlerImpl` acquires `LogStorage` write lock for processing every 
> status update received from Mesos master. During implicit and explicit 
> reconciliations, this amounts to the number of tasks in the cluster (tens of 
> thousands of times in our cluster).
> 
> According to data extracted from one of our production clusters, over 99.9% 
> of reconciliation status update events are in fact `NOOP` status updates. The 
> storage write lock contention induced by these status updates can simply be 
> eliminated by adopting double-checked locking pattern (as was done in 
> [AURORA-1820](https://issues.apache.org/jira/browse/AURORA-1820)).
> 
> This explains why the combination of reconciliation status update processing 
> and other expensive processes like snapshot can be fatal for scheduler. As 
> the lock is not fair, it does not guarantee any particular access order. 
> Therefore, snapshot structures might need to sit on the heap for a few 
> seconds before they can be written to `LogStorage` and garbage collected.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/TaskStatusHandlerImpl.java 
> 1aacecf3c2597a3f91dbc7da4c99fd1e80970f04 
>   src/test/java/org/apache/aurora/scheduler/TaskStatusHandlerImplTest.java 
> 56a6b0c9ae8da18e9a47428b8ed37a559cfd04e7 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
> 
> 
> Diff: https://reviews.apache.org/r/59030/diff/1/
> 
> 
> Testing
> -------
> 
> TBD under a test cluster
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>

Reply via email to