----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59030/#review174086 -----------------------------------------------------------
@ReviewBot retry - Mehrdad Nurolahzade On May 5, 2017, 2:36 p.m., Mehrdad Nurolahzade wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59030/ > ----------------------------------------------------------- > > (Updated May 5, 2017, 2:36 p.m.) > > > Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji. > > > Bugs: AURORA-1869 > https://issues.apache.org/jira/browse/AURORA-1869 > > > Repository: aurora > > > Description > ------- > > `TaskStatusHandlerImpl` acquires `LogStorage` write lock for processing every > status update received from Mesos master. During implicit and explicit > reconciliations, this amounts to the number of tasks in the cluster (tens of > thousands of times in our cluster). > > According to data extracted from one of our production clusters, over 99.9% > of reconciliation status update events are in fact `NOOP` status updates. The > storage write lock contention induced by these status updates can simply be > eliminated by adopting double-checked locking pattern (as was done in > [AURORA-1820](https://issues.apache.org/jira/browse/AURORA-1820)). > > This explains why the combination of reconciliation status update processing > and other expensive processes like snapshot can be fatal for scheduler. As > the lock is not fair, it does not guarantee any particular access order. > Therefore, snapshot structures might need to sit on the heap for a few > seconds before they can be written to `LogStorage` and garbage collected. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/TaskStatusHandlerImpl.java > 1aacecf3c2597a3f91dbc7da4c99fd1e80970f04 > src/test/java/org/apache/aurora/scheduler/TaskStatusHandlerImplTest.java > 56a6b0c9ae8da18e9a47428b8ed37a559cfd04e7 > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > 21d26b3930ea965487b2dec48a48a98677ba022b > > > Diff: https://reviews.apache.org/r/59030/diff/1/ > > > Testing > ------- > > TBD under a test cluster > > > Thanks, > > Mehrdad Nurolahzade > >
