> On Oct. 12, 2017, 2:05 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemAttributeStore.java > > Lines 43-52 (patched) > > <https://reviews.apache.org/r/62869/diff/4/?file=1852507#file1852507line43> > > > > With multiple accesses of `hostAttributes` this class is subject to > > race conditions. > > > > Should we add `synchronized` to all public methods?
Historically, all stores have assumed the presence of a centralized storage write lock. This is true (`LogStorage#writeLock`), but distant and apparently no longer documented. Since this type of concurrency concern is sure to exist in all stores, i propose restoring the documentation in `Storage` for now. > On Oct. 12, 2017, 2:05 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java > > Lines 146-161 (patched) > > <https://reviews.apache.org/r/62869/diff/4/?file=1852508#file1852508line146> > > > > As indicated in the other comment, this will miss the case when > > something is in the LockStore but not reflected in the corresponding update. > > > > I think we should handle that case for additonal robustness. I agree with you; the only thing holding me back is practical longevity of `LockStore`. I'd like to remove it ASAP (its only purpose is to indirectly answer "Is this job updating?"); but tight coupling to `DbJobUpdateStore` severly complicated that. Since i'm pulling that whole thread now, i'd like to avoid any kind of cross-store synchronization unless correctness is otherwise at risk. With that said, `refreshLocks()` is used locally in what appear to be read paths, but ultimately part of write paths. How would you feel about moving `fetchAllJobUpdateDetails()` and `getLockToken()` to the `JobUpdateStore.Mutable` interface, making them subject to the now-documented write lock guarantee? > On Oct. 12, 2017, 2:05 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java > > Lines 235-240 (patched) > > <https://reviews.apache.org/r/62869/diff/4/?file=1852508#file1852508line235> > > > > What happens for the case when `lockToken` is not present but have have > > a token in the LockStore? > > > > It seems like `refreshLocks` does not handle that case either. This is precisely where the coupling between `LockStore` and `JobUpdateStore` gets ugly - they both assume calling code is coordinating use of the stores in a specific way. The implementation attempts to keep the stores in sync to prevent this case, but it's technically possible. The same is true of the DB implementations. However, this is tempting me to go back and take another stab at removing `LockStore` since carrying over the store coupling feels error-prone. > On Oct. 12, 2017, 2:05 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java > > Lines 345-360 (patched) > > <https://reviews.apache.org/r/62869/diff/4/?file=1852508#file1852508line345> > > > > This relies on the assumption that all `UpdateEvent`s and > > `InstanceUpdateEvent`s are sorted chronologically. > > > > I think we should enforce this invariant in `saveJobUpdateEvent` and > > `saveJobInstanceUpdateEvent` to prevent surprises. Good call, done. - Bill ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62869/#review187861 ----------------------------------------------------------- On Oct. 10, 2017, 12:35 p.m., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62869/ > ----------------------------------------------------------- > > (Updated Oct. 10, 2017, 12:35 p.m.) > > > Review request for Aurora, Jordan Ly and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > This patch introduces map-based volatile stores, most of which were revived > from git history with minimal changes. The DB storage system is now only > used in a temporary storage when replaying a snapshot containing the > `dbScript` field. > > Please take special care to double-check my work in `SnapshotStoreImpl`, as > it must function for backwards compatibility with `Snapshot`s in the wild. > Another area of interest is the implementation of `MemJobUpdateStore`, which > must have nuanced behavior for compatibility with `DbJobUpdateStore`. Most > of this stems from behind-the-scenes interaction with `LockStore` and use of > cascading deletes. > > Note that this change removes the transactional nature of in-memory storage > operations as well as the `READ COMMITTED` transaction isolation previously > available to some stores (proven in necessary changes to > `StorageTransactionTest`). This means some stores will permit dirty reads > when they previously did not. `TaskStore` has always had this > non-transactional behavior by default, as the DB task store was never deemed > suitable for production. Nonetheless, this non-transactional behavior should > be considered safe as the scheduler fails over on a storage operation > failure, and relies on the persistent log storage for transaction atomicity. > > See the [mailing list > discussion](https://lists.apache.org/thread.html/0bb74c78da7fa12954e2438e8011b3bff73fe3bfdafeaaa2ff00a98e@%3Cdev.aurora.apache.org%3E) > for context. > > > Diffs > ----- > > RELEASE-NOTES.md c58e68080beb1740d92639601ef7cc29c63be37e > api/src/main/thrift/org/apache/aurora/gen/storage.thrift > 9ee9eff6c3d657decd93458dc3e6792a30614a60 > docs/reference/scheduler-configuration.md > 4e3f90713c307e3b9e9f84c29343af7f014f0165 > examples/vagrant/upstart/aurora-scheduler-kerberos.conf > 04e3d311c6845689cd3bd813aa8dcf9df55f1199 > examples/vagrant/upstart/aurora-scheduler.conf > 63fcc87be653835cb3c3f25dae4164f1d7c8d4da > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 7d3766829161457b1b3ba50bce128047d10b2c58 > src/jmh/java/org/apache/aurora/benchmark/SnapshotBenchmarks.java > ae59f3def75d0e1d3866c8d2f85063456643e9a6 > src/jmh/java/org/apache/aurora/benchmark/StateManagerBenchmarks.java > c293a9fd7c435e17dfc09f9ccc180154be8c7b2d > src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java > 45c2ab96d8fa65ab12ba710954128da987f0544b > src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java > f5e44dfd6d2c81e90859eaf24cf904a9662c1c72 > src/jmh/java/org/apache/aurora/benchmark/ThriftApiBenchmarks.java > 7b4050637433ab656fc09958bac1a050fec02bed > src/jmh/java/org/apache/aurora/benchmark/UpdateStoreBenchmarks.java > cac02a55d26b2934099a2b03457c703600296292 > src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java > dd0e480c628f6bb44de0ce758c8af2bb54db1bc6 > src/main/java/org/apache/aurora/scheduler/config/CliOptions.java > 64733e5b38f238e64424127a53591e6546a3e9a8 > src/main/java/org/apache/aurora/scheduler/http/JettyServerModule.java > 93cd20adc28ed700719e472bb2331137a93d1d9d > src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java > 52c3c6618a3cf1009435ca8a9cece36365913e55 > src/main/java/org/apache/aurora/scheduler/storage/Storage.java > 6c676693e531541ef9aec7f7a130c119ebf35df3 > src/main/java/org/apache/aurora/scheduler/storage/Util.java PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java > bad05f5bf1976bb590e8dd7af9b43d5d30e892a9 > src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java > cbe5a0deff1ebc38a9618e7d89ab073dfaf78d36 > src/main/java/org/apache/aurora/scheduler/storage/db/DbModule.java > fb3dadb4b4c7036112e423b2bc1b2540d793b4d6 > src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java > 7904e3880d214aac1013ce18265fed924ef7897e > src/main/java/org/apache/aurora/scheduler/storage/db/DbUtil.java > 942d180fe1a8b7a583e812f9106ee0e8db1bea55 > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java > 6fdf31d6ed430f3dc0a0117442fc8b532aa40230 > > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java > fdf9c33f80be3eff03f50732b8c4735cb9ff3b91 > > src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java > d145259653dd4df90e3877fc3e744e24c7a15d13 > > src/main/java/org/apache/aurora/scheduler/storage/mem/InMemStoresModule.java > 24296b6c61e62a09c30f01dfb84448f73a5f1e44 > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemAttributeStore.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemLockStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemQuotaStore.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemSchedulerStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/Util.java > c28fb65010af5e3db925487929d4e0e12b4101a4 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > 54202359382bfe39e7cbaec0bf4c7d65d10ca13b > src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java > a363e70176881518653d0774e9d0c4be0f7f6d78 > src/test/java/org/apache/aurora/scheduler/app/local/LocalSchedulerMain.java > 3dee55a38752faea52c0c6ace90db8352ef9b2ea > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > 9b4f2ad15ab5b61d4cccfad38ba48f17e7853425 > > src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java > 2b817076d6beb09586d4711bc10bceb31f3b74e1 > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java > 459d6bebcd7d6341dac2aead7e3dd8ce87bc9ed6 > > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronJobManagerImplTest.java > 81440f5689f9538a4c7a9e6700bf03ca89c4ba85 > src/test/java/org/apache/aurora/scheduler/http/MaintenanceTest.java > f94b58b77b7c6ce824914af7e1147e73ad5a7eed > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java > 09f443bca90f154b547d28ca5fd5be08177fdf99 > src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java > 19f9de312596395eff81bbfd073a5f617d2ef84c > src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java > 24176fe089703386e8246f6add1965c111c136ed > src/test/java/org/apache/aurora/scheduler/stats/ResourceCounterTest.java > d73656dc8fb8764c7a66eed3ea789554ce0ecc52 > src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java > 04ec7715ba9a9d47c99d4ccb92939f3e561e1a60 > > src/test/java/org/apache/aurora/scheduler/storage/db/AttributeStoreTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/db/DbAttributeStoreTest.java > f47f4a8a492fb43bacd909dc520256ed028531dd > > src/test/java/org/apache/aurora/scheduler/storage/db/DbCronJobStoreTest.java > b68298ab50a3621da8596947619f2d1c34d2d194 > > src/test/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStoreTest.java > 8fca54becde34a0d10d60e05d3809fc1c41233bf > src/test/java/org/apache/aurora/scheduler/storage/db/DbLockStoreTest.java > 8ed58e01eea09ab9f1aa4d269a5e59ce9c9c2191 > src/test/java/org/apache/aurora/scheduler/storage/db/DbQuotaStoreTest.java > 275d0fd93819169550ce39c8ddfa6c4fa0e4d920 > > src/test/java/org/apache/aurora/scheduler/storage/db/DbSchedulerStoreTest.java > a6320bf687aa41c698bdc6a2c0b5208624cff4a3 > src/test/java/org/apache/aurora/scheduler/storage/db/DbTaskStoreTest.java > bcf5be43ab69ef565bb68f9235b4c43d385898aa > > src/test/java/org/apache/aurora/scheduler/storage/db/JobUpdateStoreTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/storage/db/LockStoreTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/storage/db/QuotaStoreTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/db/SchedulerStoreTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java > b2c333eabbd3e3ae8f119170266ce6054b53fcae > > src/test/java/org/apache/aurora/scheduler/storage/mem/InMemTaskStoreTest.java > 84fa3ff48a0957760f2002ed026e72b882171109 > > src/test/java/org/apache/aurora/scheduler/storage/mem/MemAttributeStoreTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/mem/MemCronJobStoreTest.java > 91e591f3c84489386a7c20d33278ba691b3ae7ba > > src/test/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStoreTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/storage/mem/MemLockStoreTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/mem/MemQuotaStoreTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/storage/mem/MemStorageTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/storage/mem/StorageTransactionTest.java > 25f34e2bc26c6d4754c1591fad7f2165dd465d32 > src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java > 2cd19d572d2945e28630b9bce5ba58eb9753630a > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 453366ef4e4b6db45643b1927887771dbd795bf9 > > > Diff: https://reviews.apache.org/r/62869/diff/4/ > > > Testing > ------- > > Still need to complete (but should not block review): > > * Run `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` from scratch > at this patch. > * Run `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` on master, > then this patch, then on master. This verifies some level of replay > compatibility between versions. > > *Most relevant benchmark* > > On master: > ``` > Benchmark (instanceOverrides) > (instances) (metadata) Mode Cnt Score Error Units > UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A > 1000 N/A thrpt 5 696.186 ± 67.296 ops/s > UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A > 5000 N/A thrpt 5 160.567 ± 22.365 ops/s > UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A > 10000 N/A thrpt 5 82.897 ± 5.384 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 1 > N/A N/A thrpt 5 166.567 ± 20.929 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 10 > N/A N/A thrpt 5 161.149 ± 6.016 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 100 > N/A N/A thrpt 5 141.993 ± 7.119 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 1000 > N/A N/A thrpt 5 65.481 ± 0.861 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 10 thrpt 5 167.387 ± 10.009 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 100 thrpt 5 153.032 ± 19.067 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 1000 thrpt 5 140.656 ± 16.893 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 10000 thrpt 5 45.039 ± 4.840 ops/s > ``` > > With this patch: > ``` > Benchmark (instanceOverrides) > (instances) (metadata) Mode Cnt Score Error Units > UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A > 1000 N/A thrpt 5 43700548.771 ± 4491654.631 ops/s > UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A > 5000 N/A thrpt 5 44234835.787 ± 6078260.236 ops/s > UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A > 10000 N/A thrpt 5 43362266.895 ± 7546660.236 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 1 > N/A N/A thrpt 5 45796640.959 ± 3698698.216 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 10 > N/A N/A thrpt 5 46402021.197 ± 4800377.641 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 100 > N/A N/A thrpt 5 44416822.563 ± 9591398.027 ops/s > UpdateStoreBenchmarks.JobInstructionsBenchmark.run 1000 > N/A N/A thrpt 5 44004609.998 ± 20332063.977 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 10 thrpt 5 47403264.761 ± 4787575.678 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 100 thrpt 5 45629630.921 ± 7654875.770 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 1000 thrpt 5 44150253.103 ± 9376206.202 ops/s > UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run N/A > N/A 10000 thrpt 5 44215602.085 ± 8326237.269 ops/s > ``` > > > Thanks, > > Bill Farner > >
