> On May 5, 2017, 2:46 p.m., Aurora ReviewBot wrote: > > Master (8a6e01c) is red with this patch. > > ./build-support/jenkins/build.sh > > > > Test coverage missing for > > org/apache/aurora/scheduler/filter/ConstraintMatcher > > Test coverage missing for > > org/apache/aurora/scheduler/filter/SchedulingFilterImpl$1 > > Test coverage missing for > > org/apache/aurora/scheduler/http/api/security/AuthorizeHeaderToken > > Test coverage missing for > > org/apache/aurora/scheduler/state/TaskAssigner$TaskAssignerImpl > > Test coverage missing for > > org/apache/aurora/scheduler/state/UUIDGenerator$UUIDGeneratorImpl > > Test coverage missing for > > org/apache/aurora/scheduler/discovery/CommonsServiceGroupMonitor > > Test coverage missing for > > org/apache/aurora/scheduler/http/api/GsonMessageBodyHandler > > Test coverage missing for org/apache/aurora/scheduler/http/api/ApiBeta > > Test coverage missing for > > org/apache/aurora/scheduler/http/api/GsonMessageBodyHandler$1 > > Test coverage missing for > > org/apache/aurora/scheduler/cron/quartz/AuroraCronJob > > Test coverage missing for org/apache/aurora/scheduler/offers/OfferSettings > > Test coverage missing for > > org/apache/aurora/scheduler/mesos/TaskStatusStats$3 > > Test coverage missing for > > org/apache/aurora/scheduler/mesos/TaskStatusStats$2 > > Test coverage missing for > > org/apache/aurora/scheduler/mesos/TaskStatusStats$1 > > Test coverage missing for > > org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule > > Test coverage missing for > > org/apache/aurora/scheduler/mesos/TaskStatusStats > > Test coverage missing for > > org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptor$1 > > Test coverage missing for > > org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptor$2 > > Test coverage missing for > > org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptor > > Test coverage missing for > > org/apache/aurora/scheduler/thrift/aop/ThriftWorkload$ThriftWorkloadCounterImpl > > Test coverage missing for org/apache/aurora/scheduler/preemptor/BiCache$1 > > Test coverage missing for > > org/apache/aurora/scheduler/events/PubsubEvent$DriverDisconnected > > Test coverage missing for > > org/apache/aurora/scheduler/events/PubsubEvent$TaskStatusReceived > > Test coverage missing for > > org/apache/aurora/scheduler/events/NotifyingSchedulingFilter > > Test coverage missing for > > org/apache/aurora/scheduler/events/PubsubEvent$DriverRegistered > > Test coverage missing for > > org/apache/aurora/scheduler/storage/backup/TemporaryStorage$TemporaryStorageFactory$1 > > Test coverage missing for > > org/apache/aurora/scheduler/storage/backup/Recovery$RecoveryImpl > > Test coverage missing for > > org/apache/aurora/scheduler/storage/backup/TemporaryStorage$TemporaryStorageFactory > > Test coverage missing for > > org/apache/aurora/scheduler/storage/backup/Recovery$RecoveryImpl$PendingRecovery > > Test coverage missing for org/apache/aurora/scheduler/HostOffer$1 > > Test coverage missing for > > org/apache/aurora/scheduler/TaskIdGenerator$TaskIdGeneratorImpl > > Test coverage missing for > > org/apache/aurora/scheduler/TaskStatusHandlerImpl$1 > > > > * Try: > > Run with --stacktrace option to get the stack trace. Run with --info or > > --debug option to get more log output. > > ============================================================================== > > > > BUILD FAILED > > > > Total time: 5 mins 21.546 secs > > > > > > I will refresh this build result if you post a review containing > > "@ReviewBot retry"
@ReviewBot retry - Mehrdad ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59030/#review174082 ----------------------------------------------------------- On May 5, 2017, 2:36 p.m., Mehrdad Nurolahzade wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59030/ > ----------------------------------------------------------- > > (Updated May 5, 2017, 2:36 p.m.) > > > Review request for Aurora, David McLaughlin, Stephan Erb, and Zameer Manji. > > > Bugs: AURORA-1869 > https://issues.apache.org/jira/browse/AURORA-1869 > > > Repository: aurora > > > Description > ------- > > `TaskStatusHandlerImpl` acquires `LogStorage` write lock for processing every > status update received from Mesos master. During implicit and explicit > reconciliations, this amounts to the number of tasks in the cluster (tens of > thousands of times in our cluster). > > According to data extracted from one of our production clusters, over 99.9% > of reconciliation status update events are in fact `NOOP` status updates. The > storage write lock contention induced by these status updates can simply be > eliminated by adopting double-checked locking pattern (as was done in > [AURORA-1820](https://issues.apache.org/jira/browse/AURORA-1820)). > > This explains why the combination of reconciliation status update processing > and other expensive processes like snapshot can be fatal for scheduler. As > the lock is not fair, it does not guarantee any particular access order. > Therefore, snapshot structures might need to sit on the heap for a few > seconds before they can be written to `LogStorage` and garbage collected. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/TaskStatusHandlerImpl.java > 1aacecf3c2597a3f91dbc7da4c99fd1e80970f04 > src/test/java/org/apache/aurora/scheduler/TaskStatusHandlerImplTest.java > 56a6b0c9ae8da18e9a47428b8ed37a559cfd04e7 > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > 21d26b3930ea965487b2dec48a48a98677ba022b > > > Diff: https://reviews.apache.org/r/59030/diff/1/ > > > Testing > ------- > > TBD under a test cluster > > > Thanks, > > Mehrdad Nurolahzade > >
