> On Dec. 8, 2017, 1:34 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java > > Line 258 (original), 154 (patched) > > <https://reviews.apache.org/r/64286/diff/2/?file=1910716#file1910716line258> > > > > Why do we grab the writeLock here? Who else is going to grab it if this > > method is synchronized? Additionally, writes can't occur unless the Storage > > is READY (I think). > > > > If we do have to write, I would prefer it be wrapped in the `write` > > method like it was before in order to simplify `writeLock` usages.
> Why do we grab the writeLock here? The write lock has historically been relied upon to guarantee visibility of changes across threads. Given that underlying stores are now concurrent, it may no longer be necessary for visibility guarantees. However, i do still think it's a good defensive measure to eliminate the possibility of simultaneous calls to `start` and `write`. > Who else is going to grab it if this method is synchronized I believe the `synchronized` is unnecessary, and will remove it. > Additionally, writes can't occur unless the Storage is READY This is correct. > On Dec. 8, 2017, 1:34 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java > > Lines 334-339 (original) > > <https://reviews.apache.org/r/64286/diff/2/?file=1910716#file1910716line336> > > > > Ah I see, you don't want to persist so you grab the write lock directly > > above as opposed to through this method. Seems like a nuanced interaction. > Seems like a nuanced interaction Do you agree that it's less nuanced than before? I found the previous code quite difficult to reason about. I'm open to suggestions! > On Dec. 8, 2017, 1:34 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java > > Lines 125 (patched) > > <https://reviews.apache.org/r/64286/diff/2/?file=1910717#file1910717line125> > > > > New Op types are not backwards compatible -- is this written to > > persistant storage or just created for future use? It is not. This is called out in a comment in `storage.thrift` and mentioned in reviewer notes. However, i've grown a greater distaste for solving the snapshot-reset case this way and will take the alternative approach of layering over `Op`. > On Dec. 8, 2017, 1:34 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java > > Lines 82-88 (patched) > > <https://reviews.apache.org/r/64286/diff/2/?file=1910721#file1910721line82> > > > > How do snapshot failures propogate, do they kill the scheduler or just > > the service? If everything stays up, can we increment a counter for failed > > snapshots? > > > > We probably don't want to kill the service if a snapshot fails since > > that means it will never try to snapshot again. Service failures terminate the scheduler, behavior provided by a [listener](https://github.com/apache/aurora/blob/301f066369a9ef7262c1702c77004d12bc8eac00/src/main/java/org/apache/aurora/GuavaUtils.java#L53-L57) we add to `ServiceManager`. I was able to observe this directly by throwing an exception from the `SnapshotService`: ```console SEVERE: Service SnapshotService$$EnhancerByGuice$$9e5ef4c0 [FAILED] has failed in the RUNNING state. java.lang.RuntimeException: FAILURE! at org.apache.aurora.scheduler.storage.log.SnapshotService.snapshot(SnapshotService.java:82) at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) at org.apache.aurora.scheduler.storage.log.SnapshotService.runOneIteration(SnapshotService.java:59) at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188) at com.google.common.util.concurrent.Callables$4.run(Callables.java:122) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) E1212 17:14:46.570 [SnapshotService$$EnhancerByGuice$$9e5ef4c0 RUNNING, GuavaUtils$LifecycleShutdownListener] Service: SnapshotService$$EnhancerByGuice$$9e5ef4c0 [FAILED] failed unexpectedly. Tr iggering shutdown. ``` - Bill ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64286/#review193211 ----------------------------------------------------------- On Dec. 5, 2017, 9:58 a.m., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64286/ > ----------------------------------------------------------- > > (Updated Dec. 5, 2017, 9:58 a.m.) > > > Review request for Aurora, Jordan Ly and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > This cleans up the various interfaces around persisting and recovering from > `Snapshot`s. Most importantly, `LogPersistence` no longer bypasses the > `recover()` `Op` stream to apply snapshots. As a result, it should be > straightforward to build a migration utility that clones `LogPersistence` > state into another `Persistence` implementation. > > **Reviewers** > Apologies for the large patch. These components were circular, and i did not > find a clean way to break the important pieces of this change apart. > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/storage.thrift > 22104979ce8844929f439c44b0f4c63bf90f07d7 > src/jmh/java/org/apache/aurora/benchmark/SnapshotBenchmarks.java > 755582da0e6ae9fcbfb191c195f86a003d00eb2a > > src/main/java/org/apache/aurora/scheduler/storage/DistributedSnapshotStore.java > 0c6a955279f4cadac87c765d91602ac307162f7d > src/main/java/org/apache/aurora/scheduler/storage/SnapshotStore.java > 6b5e5dd5b4d1e28f6d604bcc87c983453574adbf > src/main/java/org/apache/aurora/scheduler/storage/Snapshotter.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/backup/BackupModule.java > 7eaae89d63bf45d6ddc3cffd8ee62d4549e0cc10 > src/main/java/org/apache/aurora/scheduler/storage/backup/Recovery.java > 3a62f02212971f1f73933018ac11702a7041b7d4 > src/main/java/org/apache/aurora/scheduler/storage/backup/StorageBackup.java > 2d61678c76d350286415265fef5056d5377d9788 > > src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java > 18296b0d9674c05ce4b6b36dbcdc060bc2d17b0a > > src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java > 85b2113631586f43d854c4d2812f43b7b864d452 > src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteAheadStorage.java > 667db062e791a916853edb7fabd6235a2e9bc272 > src/main/java/org/apache/aurora/scheduler/storage/log/LogPersistence.java > e70e6051582ca90ae72014626b983bbf4b8d5b48 > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java > 75ec42aad0b822d6c3dcd5b1307a4fcb86caa5c0 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java > 739fad75c4d569a7f2ec01875a4ae63893ed7ac8 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > a519b0728b02b5ccdaaa5566566aae4691f07ffa > src/test/java/org/apache/aurora/scheduler/app/local/LocalSchedulerMain.java > aeb8685e4e2d65c70636965a3a615e426c7a5ec2 > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > 5cb5310ed096ca1fb47b980401e3712948271ac4 > src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java > 09560f4929a252568ed487fbf27aac5a83e62c79 > > src/test/java/org/apache/aurora/scheduler/storage/backup/StorageBackupTest.java > fff376fabba37ea450ade469e48c5e85d6b3f96a > > src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java > 07912b6a8ffb4bd3f87861f8c17242f6056aaf49 > > src/test/java/org/apache/aurora/scheduler/storage/durability/WriteAheadStorageTest.java > e8b564b0c7fb82d4057f6ee8dcf7b327f1c5e18c > > src/test/java/org/apache/aurora/scheduler/storage/log/LogPersistenceTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/log/NonVolatileStorageTest.java > eb966d722dc01d1760566bc57358afac722d5fec > > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotServiceTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java > 5634f921cc3d9b1773af0a7ab5a5150756501df3 > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > fd81bff5315d290f2b2c36959d566d5c595ef9c0 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 883738446a093e464630d06f1241832129165fcc > src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java > bb0fd89b9edee2048746876da9edbaf5c9e881ca > > > Diff: https://reviews.apache.org/r/64286/diff/2/ > > > Testing > ------- > > end-to-end tests pass > > > Thanks, > > Bill Farner > >
