Re: Review Request 26478: Add a flag to deduplicate storage snapshots
On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 56 https://reviews.apache.org/r/26478/diff/3/?file=721240#file721240line56 'reduplicate' doesn't sit well with me. Perhaps 'normalize' and 'denormalize' are more standard terms that apply? I don't feel too strongly, so don't change it if they seem equally good to you. I am decidedly ambivalent about the name On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 87 https://reviews.apache.org/r/26478/diff/3/?file=721240#file721240line87 This line is not covered in tests. Please address. However, i suggest you implement this as below, and inline. ScheduledTask partialScheduledTask = scheduledTask.deepCopy(); partialScheduledTask.getAssignedTask().unsetTaskConfig(); return partialScheduledTask; Inlined. On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 110 https://reviews.apache.org/r/26478/diff/2-3/?file=716380#file716380line110 Please use a better variable name. Inlined, so no variable to name. On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote: docs/scheduler-storage.md, line 13 https://reviews.apache.org/r/26478/diff/3/?file=721234#file721234line13 Most users will want to enable both compression and deduplication. I suggest you yank this sentence out of this section, and add to the opening paragraph: The scheduler has two optimizations to reduce the size of snapshots and thus improve snapshot performance: compression and deduplication. Most users will want to enable both compression and deduplication. good idea, added - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56762 --- On Oct. 14, 2014, 6:32 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 14, 2014, 6:32 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 docs/scheduler-storage.md PRE-CREATION src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/main/thrift/org/apache/aurora/gen/storage.thrift, line 205 https://reviews.apache.org/r/26478/diff/2/?file=716383#file716383line205 Please, document fields. What is taskConfigId here? Documented all fields. On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 35 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line35 This javadoc would highly benefit from some details about the source of duplication and a proposed solution. It's not obvious for a newcomer why TaskConfigs are duplicated. Added documentation elsewhere, happy to add more here if you think it's needed. On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 126 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line126 Why result field here? refactored this code to be less performant, more readable On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java, line 71 https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line71 How about a roundtrip test with no tasks in a snapshot? Good idea, added null-checking. On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 155 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line155 Inverse log message of a hydration ratio would be useful here along with a Starting redupulication. Added. - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56010 --- On Oct. 15, 2014, 12:17 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 15, 2014, 12:17 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 docs/scheduler-storage.md PRE-CREATION src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56795 --- Ship it! Ship It! - Maxim Khutornenko On Oct. 15, 2014, 7:52 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 15, 2014, 7:52 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 docs/scheduler-storage.md PRE-CREATION src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 15, 2014, 2:32 p.m.) Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji. Changes --- -David, who is on vacation Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 docs/scheduler-storage.md PRE-CREATION src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 15, 2014, 5:05 p.m.) Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji. Changes --- Bill's feedback + fix markdown syntax Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs (updated) - config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 docs/deploying-aurora-scheduler.md 20f5f389f55e800cc5e6638e62e25cfb6e2d72b4 docs/scheduler-storage.md PRE-CREATION src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56562 --- Ping? Any progress here? - Bill Farner On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 9, 2014, 2:39 a.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 14, 2014, 6:32 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Changes --- Documentation Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs (updated) - config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 docs/scheduler-storage.md PRE-CREATION src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java, line 86 https://reviews.apache.org/r/26478/diff/2/?file=716376#file716376line86 Method interceptors should work fine for package-priviate methods [1]. Just for my understanding - are you going with protected because package-private class + protected method is slightly more restrictive? [1] https://github.com/google/guice/wiki/AOP#limitations I'm going with protected because it's more indicative of why it's visible at all (I want guice's dynamic proxy subclass to override it and add timing information). It's also more restrictive than package-private in this context. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 61 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line61 Tasks.SCHEDULED_TO_INFO This operates on the mutable structures, SCHEDULED_TO_INFO operates on the immutable ones. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 71 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line71 Do you have numbers on how much time this routine saves when compared to a full deep copy and unsetting the field you're trying to clear? Unless it's a significant contributor to overall snapshot performance, i'd prefer the more concise code of the latter approach. My hunch is that this one might save O(100 ms), but the ones below are noise and not worth the code. I don't have data for this specific optimization - my gut is that we should avoid deepCopy on Snapshots due to them respresenting essentially the entire scheduler heap. Happy to remove if you think it's not warranted. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java, line 84 https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line84 The assertion error will be much more useful if you do `assertEquals(emptySet, b)` rather than `assertEquals(n, b.size())`. It's actually `null` here, but `.size()` in thrift reports `0` for `null`. Changed to explicit `assertNull`. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java, line 89 https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line89 Isn't this check redundant to the one below? not quite, since the latter is making a Set out of a List, and therefore potentially doing deduplication for me. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java, line 98 https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line98 Can you try for `assertEquals` here as well, with an expected list of `DeduplicatedScheduledTask` objects? It will catch classes of bugs missed with this check (e.g. extra entries in the `partialTasks` list), and make it easier to diagnose a failed test. Since the input is a Snapshot (SetScheduledTask) it's not possible to know the expected order of the output list ahead-of-time. Added an explicit check against extra entries. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java, line 80 https://reviews.apache.org/r/26478/diff/2/?file=716379#file716379line80 First sentence contains mixed thoughts for an odd sentence, rewrite? Rewrote. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java, line 81 https://reviews.apache.org/r/26478/diff/2/?file=716379#file716379line81 Please make this more verbose to explain what exactly is backwards in compatible. Don't worry about being too wordy. I'd also like to see a TODO to remove this flag, since it should become the default. Perhaps this should be included in the 0.7.0 deprecations list. Moved to separate doc. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 110 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line110 Inline this on :114. done On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 116 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line116 It would be nice to see a brief comment here to give the reader an overview of what's going on in the routine. Added. On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review55986 --- src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java https://reviews.apache.org/r/26478/#comment96339 Method interceptors should work fine for package-priviate methods [1]. Just for my understanding - are you going with protected because package-private class + protected method is slightly more restrictive? [1] https://github.com/google/guice/wiki/AOP#limitations src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java https://reviews.apache.org/r/26478/#comment96331 First sentence contains mixed thoughts for an odd sentence, rewrite? src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java https://reviews.apache.org/r/26478/#comment96332 Please make this more verbose to explain what exactly is backwards in compatible. Don't worry about being too wordy. I'd also like to see a TODO to remove this flag, since it should become the default. Perhaps this should be included in the 0.7.0 deprecations list. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96334 Tasks.SCHEDULED_TO_INFO src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96335 Do you have numbers on how much time this routine saves when compared to a full deep copy and unsetting the field you're trying to clear? Unless it's a significant contributor to overall snapshot performance, i'd prefer the more concise code of the latter approach. My hunch is that this one might save O(100 ms), but the ones below are noise and not worth the code. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96336 Inline this on :114. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96337 It would be nice to see a brief comment here to give the reader an overview of what's going on in the routine. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96338 remove newline src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96340 The mention of taskConfigId is vague. There should be a doc somewhere indicating that `taskConfigId` is referencing the _index_ of an entry in `taskConfigs`. src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96341 remove newline src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96342 Please doc. src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96343 The assertion error will be much more useful if you do `assertEquals(emptySet, b)` rather than `assertEquals(n, b.size())`. src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96344 Isn't this check redundant to the one below? src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96345 Can you try for `assertEquals` here as well, with an expected list of `DeduplicatedScheduledTask` objects? It will catch classes of bugs missed with this check (e.g. extra entries in the `partialTasks` list), and make it easier to diagnose a failed test. - Bill Farner On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 9, 2014, 2:39 a.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56010 --- src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96362 This javadoc would highly benefit from some details about the source of duplication and a proposed solution. It's not obvious for a newcomer why TaskConfigs are duplicated. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96365 Why result field here? src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96360 There is a remote possibility of numOutputTasks to be zero in a snapshot for an empty cluster. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96363 Inverse log message of a hydration ratio would be useful here along with a Starting redupulication. src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96364 Please, document fields. What is taskConfigId here? src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96368 How about a roundtrip test with no tasks in a snapshot? - Maxim Khutornenko On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 9, 2014, 2:39 a.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 133 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line133 There is a remote possibility of numOutputTasks to be zero in a snapshot for an empty cluster. hmm - there is, and that would result in NaN appearing in the log. Which is technically accurate - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56010 --- On Oct. 8, 2014, 7:39 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 8, 2014, 7:39 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerFactory.java 52b2e83efc0c289366b5246aff32553a546b1a70 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 8, 2014, 7:39 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Changes --- Reduce visibility of some classes Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs (updated) - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review55943 --- Ship it! Ship It! - Zameer Manji On Oct. 8, 2014, 7:39 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 8, 2014, 7:39 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney