Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-15 Thread Kevin Sweeney


 On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 56
  https://reviews.apache.org/r/26478/diff/3/?file=721240#file721240line56
 
  'reduplicate' doesn't sit well with me.  Perhaps 'normalize' and 
  'denormalize' are more standard terms that apply?  I don't feel too 
  strongly, so don't change it if they seem equally good to you.

I am decidedly ambivalent about the name


 On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 87
  https://reviews.apache.org/r/26478/diff/3/?file=721240#file721240line87
 
  This line is not covered in tests.  Please address.
  
  However, i suggest you implement this as below, and inline.
  
  ScheduledTask partialScheduledTask = scheduledTask.deepCopy();
  partialScheduledTask.getAssignedTask().unsetTaskConfig();
  return partialScheduledTask;

Inlined.


 On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 110
  https://reviews.apache.org/r/26478/diff/2-3/?file=716380#file716380line110
 
  Please use a better variable name.

Inlined, so no variable to name.


 On Oct. 15, 2014, 11:54 a.m., Bill Farner wrote:
  docs/scheduler-storage.md, line 13
  https://reviews.apache.org/r/26478/diff/3/?file=721234#file721234line13
 
   Most users will want to enable both compression and deduplication.
  
  I suggest you yank this sentence out of this section, and add to the 
  opening paragraph:
  
   The scheduler has two optimizations to reduce the size of snapshots 
  and thus improve snapshot performance: compression and deduplication.  Most 
  users will want to enable both compression and deduplication.

good idea, added


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56762
---


On Oct. 14, 2014, 6:32 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 14, 2014, 6:32 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
   docs/scheduler-storage.md PRE-CREATION 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-15 Thread Kevin Sweeney


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/main/thrift/org/apache/aurora/gen/storage.thrift, line 205
  https://reviews.apache.org/r/26478/diff/2/?file=716383#file716383line205
 
  Please, document fields. What is taskConfigId here?

Documented all fields.


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 35
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line35
 
  This javadoc would highly benefit from some details about the source of 
  duplication and a proposed solution. It's not obvious for a newcomer why 
  TaskConfigs are duplicated.

Added documentation elsewhere, happy to add more here if you think it's needed.


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 126
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line126
 
  Why result field here?

refactored this code to be less performant, more readable


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
   line 71
  https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line71
 
  How about a roundtrip test with no tasks in a snapshot?

Good idea, added null-checking.


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 155
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line155
 
  Inverse log message of a hydration ratio would be useful here along 
  with a Starting redupulication.

Added.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56010
---


On Oct. 15, 2014, 12:17 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 15, 2014, 12:17 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
   docs/scheduler-storage.md PRE-CREATION 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-15 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56795
---

Ship it!


Ship It!

- Maxim Khutornenko


On Oct. 15, 2014, 7:52 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 15, 2014, 7:52 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
   docs/scheduler-storage.md PRE-CREATION 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-15 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/
---

(Updated Oct. 15, 2014, 2:32 p.m.)


Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji.


Changes
---

-David, who is on vacation


Bugs: AURORA-722
https://issues.apache.org/jira/browse/AURORA-722


Repository: aurora


Description
---

Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x 
deduplication ratio on Twitter's production snapshots.

This format is backwards-incompatible, so this patch introduces a flag to 
control its use (defaulting off).

This only changes the format used to write to the replicated log (where time is 
of the essence since all writes are done holding the global storage lock) - the 
format of backups written to disk is unchanged, as backups don't hold the lock.


Diffs
-

  config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
  docs/scheduler-storage.md PRE-CREATION 
  src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
65e986eaa2c4193431ca048425a1ed3ab60f5882 
  src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
7239a6a5eb5479e395e16423c83fdf80a77e5a83 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
4b50e2069407dc263b4fc93f1827d3a8836253bf 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
f806297d1d0700155c976743f936b2b8a3a390fb 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
769348e6b8a5c701734afff391b1c77de35222c6 
  
src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
22db80eaf34fe736fa5a3a9289836c9ac9e59906 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
  src/main/thrift/org/apache/aurora/gen/storage.thrift 
5350ec945fbe028ee4641683815a068ce00b5efc 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
39729b374fe4e383f9b5ada7d016923766df9af7 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
7a8c3b882633376a1bf6a78616d55aaa7401d13f 
  
src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/26478/diff/


Testing
---

./gradlew -Pq build


Thanks,

Kevin Sweeney



Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-15 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/
---

(Updated Oct. 15, 2014, 5:05 p.m.)


Review request for Aurora, Maxim Khutornenko, Bill Farner, and Zameer Manji.


Changes
---

Bill's feedback + fix markdown syntax


Bugs: AURORA-722
https://issues.apache.org/jira/browse/AURORA-722


Repository: aurora


Description
---

Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x 
deduplication ratio on Twitter's production snapshots.

This format is backwards-incompatible, so this patch introduces a flag to 
control its use (defaulting off).

This only changes the format used to write to the replicated log (where time is 
of the essence since all writes are done holding the global storage lock) - the 
format of backups written to disk is unchanged, as backups don't hold the lock.


Diffs (updated)
-

  config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
  docs/deploying-aurora-scheduler.md 20f5f389f55e800cc5e6638e62e25cfb6e2d72b4 
  docs/scheduler-storage.md PRE-CREATION 
  src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
65e986eaa2c4193431ca048425a1ed3ab60f5882 
  src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
7239a6a5eb5479e395e16423c83fdf80a77e5a83 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
4b50e2069407dc263b4fc93f1827d3a8836253bf 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
f806297d1d0700155c976743f936b2b8a3a390fb 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
769348e6b8a5c701734afff391b1c77de35222c6 
  
src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
22db80eaf34fe736fa5a3a9289836c9ac9e59906 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
  src/main/thrift/org/apache/aurora/gen/storage.thrift 
5350ec945fbe028ee4641683815a068ce00b5efc 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
39729b374fe4e383f9b5ada7d016923766df9af7 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
7a8c3b882633376a1bf6a78616d55aaa7401d13f 
  
src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/26478/diff/


Testing
---

./gradlew -Pq build


Thanks,

Kevin Sweeney



Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-14 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56562
---


Ping?  Any progress here?

- Bill Farner


On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 9, 2014, 2:39 a.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-14 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/
---

(Updated Oct. 14, 2014, 6:32 p.m.)


Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.


Changes
---

Documentation


Bugs: AURORA-722
https://issues.apache.org/jira/browse/AURORA-722


Repository: aurora


Description
---

Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x 
deduplication ratio on Twitter's production snapshots.

This format is backwards-incompatible, so this patch introduces a flag to 
control its use (defaulting off).

This only changes the format used to write to the replicated log (where time is 
of the essence since all writes are done holding the global storage lock) - the 
format of backups written to disk is unchanged, as backups don't hold the lock.


Diffs (updated)
-

  config/legacy_untested_classes.txt 3af99867eb25a7e44bb3520e82b1def125bd6e15 
  docs/scheduler-storage.md PRE-CREATION 
  src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
65e986eaa2c4193431ca048425a1ed3ab60f5882 
  src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
7239a6a5eb5479e395e16423c83fdf80a77e5a83 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
4b50e2069407dc263b4fc93f1827d3a8836253bf 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
f806297d1d0700155c976743f936b2b8a3a390fb 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
769348e6b8a5c701734afff391b1c77de35222c6 
  
src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
22db80eaf34fe736fa5a3a9289836c9ac9e59906 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
  src/main/thrift/org/apache/aurora/gen/storage.thrift 
5350ec945fbe028ee4641683815a068ce00b5efc 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
39729b374fe4e383f9b5ada7d016923766df9af7 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
7a8c3b882633376a1bf6a78616d55aaa7401d13f 
  
src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/26478/diff/


Testing
---

./gradlew -Pq build


Thanks,

Kevin Sweeney



Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-14 Thread Kevin Sweeney


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java, 
  line 86
  https://reviews.apache.org/r/26478/diff/2/?file=716376#file716376line86
 
  Method interceptors should work fine for package-priviate methods [1].  
  Just for my understanding - are you going with protected because 
  package-private class + protected method is slightly more restrictive?
  
  [1] https://github.com/google/guice/wiki/AOP#limitations

I'm going with protected because it's more indicative of why it's visible at 
all (I want guice's dynamic proxy subclass to override it and add timing 
information). It's also more restrictive than package-private in this context.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 61
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line61
 
  Tasks.SCHEDULED_TO_INFO

This operates on the mutable structures, SCHEDULED_TO_INFO operates on the 
immutable ones.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 71
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line71
 
  Do you have numbers on how much time this routine saves when compared 
  to a full deep copy and unsetting the field you're trying to clear?  Unless 
  it's a significant contributor to overall snapshot performance, i'd prefer 
  the more concise code of the latter approach.
  
  My hunch is that this one might save O(100 ms), but the ones below are 
  noise and not worth the code.

I don't have data for this specific optimization - my gut is that we should 
avoid deepCopy on Snapshots due to them respresenting essentially the entire 
scheduler heap. Happy to remove if you think it's not warranted.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
   line 84
  https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line84
 
  The assertion error will be much more useful if you do 
  `assertEquals(emptySet, b)` rather than `assertEquals(n, b.size())`.

It's actually `null` here, but `.size()` in thrift reports `0` for `null`. 
Changed to explicit `assertNull`.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
   line 89
  https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line89
 
  Isn't this check redundant to the one below?

not quite, since the latter is making a Set out of a List, and therefore 
potentially doing deduplication for me.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java,
   line 98
  https://reviews.apache.org/r/26478/diff/2/?file=716386#file716386line98
 
  Can you try for `assertEquals` here as well, with an expected list of 
  `DeduplicatedScheduledTask` objects?  It will catch classes of bugs missed 
  with this check (e.g. extra entries in the `partialTasks` list), and make 
  it easier to diagnose a failed test.

Since the input is a Snapshot (SetScheduledTask) it's not possible to know 
the expected order of the output list ahead-of-time. Added an explicit check 
against extra entries.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java,
   line 80
  https://reviews.apache.org/r/26478/diff/2/?file=716379#file716379line80
 
  First sentence contains mixed thoughts for an odd sentence, rewrite?

Rewrote.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java,
   line 81
  https://reviews.apache.org/r/26478/diff/2/?file=716379#file716379line81
 
  Please make this more verbose to explain what exactly is backwards in 
  compatible.  Don't worry about being too wordy.
  
  I'd also like to see a TODO to remove this flag, since it should become 
  the default.  Perhaps this should be included in the 0.7.0 deprecations 
  list.

Moved to separate doc.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 110
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line110
 
  Inline this on :114.

done


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 116
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line116
 
  It would be nice to see a brief comment here to give the reader an 
  overview of what's going on in the routine.

Added.


 On Oct. 9, 2014, 8:30 a.m., Bill Farner wrote:
  

Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review55986
---



src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java
https://reviews.apache.org/r/26478/#comment96339

Method interceptors should work fine for package-priviate methods [1].  
Just for my understanding - are you going with protected because 
package-private class + protected method is slightly more restrictive?

[1] https://github.com/google/guice/wiki/AOP#limitations



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java
https://reviews.apache.org/r/26478/#comment96331

First sentence contains mixed thoughts for an odd sentence, rewrite?



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java
https://reviews.apache.org/r/26478/#comment96332

Please make this more verbose to explain what exactly is backwards in 
compatible.  Don't worry about being too wordy.

I'd also like to see a TODO to remove this flag, since it should become the 
default.  Perhaps this should be included in the 0.7.0 deprecations list.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96334

Tasks.SCHEDULED_TO_INFO



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96335

Do you have numbers on how much time this routine saves when compared to a 
full deep copy and unsetting the field you're trying to clear?  Unless it's a 
significant contributor to overall snapshot performance, i'd prefer the more 
concise code of the latter approach.

My hunch is that this one might save O(100 ms), but the ones below are 
noise and not worth the code.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96336

Inline this on :114.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96337

It would be nice to see a brief comment here to give the reader an overview 
of what's going on in the routine.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96338

remove newline



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96340

The mention of taskConfigId is vague.  There should be a doc somewhere 
indicating that `taskConfigId` is referencing the _index_ of an entry in 
`taskConfigs`.



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96341

remove newline



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96342

Please doc.



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96343

The assertion error will be much more useful if you do 
`assertEquals(emptySet, b)` rather than `assertEquals(n, b.size())`.



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96344

Isn't this check redundant to the one below?



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96345

Can you try for `assertEquals` here as well, with an expected list of 
`DeduplicatedScheduledTask` objects?  It will catch classes of bugs missed with 
this check (e.g. extra entries in the `partialTasks` list), and make it easier 
to diagnose a failed test.


- Bill Farner


On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 9, 2014, 2:39 a.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   

Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56010
---



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96362

This javadoc would highly benefit from some details about the source of 
duplication and a proposed solution. It's not obvious for a newcomer why 
TaskConfigs are duplicated.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96365

Why result field here?



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96360

There is a remote possibility of numOutputTasks to be zero in a snapshot 
for an empty cluster.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96363

Inverse log message of a hydration ratio would be useful here along with a 
Starting redupulication.



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96364

Please, document fields. What is taskConfigId here?



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96368

How about a roundtrip test with no tasks in a snapshot?


- Maxim Khutornenko


On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 9, 2014, 2:39 a.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-09 Thread Kevin Sweeney


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 133
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line133
 
  There is a remote possibility of numOutputTasks to be zero in a 
  snapshot for an empty cluster.

hmm - there is, and that would result in NaN appearing in the log. Which is 
technically accurate


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56010
---


On Oct. 8, 2014, 7:39 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 8, 2014, 7:39 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-08 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/
---

Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.


Bugs: AURORA-722
https://issues.apache.org/jira/browse/AURORA-722


Repository: aurora


Description
---

Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x 
deduplication ratio on Twitter's production snapshots.

This format is backwards-incompatible, so this patch introduces a flag to 
control its use (defaulting off).

This only changes the format used to write to the replicated log (where time is 
of the essence since all writes are done holding the global storage lock) - the 
format of backups written to disk is unchanged, as backups don't hold the lock.


Diffs
-

  build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
  src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
65e986eaa2c4193431ca048425a1ed3ab60f5882 
  src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
7239a6a5eb5479e395e16423c83fdf80a77e5a83 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
4b50e2069407dc263b4fc93f1827d3a8836253bf 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
f806297d1d0700155c976743f936b2b8a3a390fb 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
769348e6b8a5c701734afff391b1c77de35222c6 
  
src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
22db80eaf34fe736fa5a3a9289836c9ac9e59906 
  
src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerFactory.java 
52b2e83efc0c289366b5246aff32553a546b1a70 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
  src/main/thrift/org/apache/aurora/gen/storage.thrift 
5350ec945fbe028ee4641683815a068ce00b5efc 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
39729b374fe4e383f9b5ada7d016923766df9af7 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
7a8c3b882633376a1bf6a78616d55aaa7401d13f 
  
src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/26478/diff/


Testing
---

./gradlew -Pq build


Thanks,

Kevin Sweeney



Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-08 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/
---

(Updated Oct. 8, 2014, 7:39 p.m.)


Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.


Changes
---

Reduce visibility of some classes


Bugs: AURORA-722
https://issues.apache.org/jira/browse/AURORA-722


Repository: aurora


Description
---

Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x 
deduplication ratio on Twitter's production snapshots.

This format is backwards-incompatible, so this patch introduces a flag to 
control its use (defaulting off).

This only changes the format used to write to the replicated log (where time is 
of the essence since all writes are done holding the global storage lock) - the 
format of backups written to disk is unchanged, as backups don't hold the lock.


Diffs (updated)
-

  build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
  src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
65e986eaa2c4193431ca048425a1ed3ab60f5882 
  src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
7239a6a5eb5479e395e16423c83fdf80a77e5a83 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
4b50e2069407dc263b4fc93f1827d3a8836253bf 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
f806297d1d0700155c976743f936b2b8a3a390fb 
  src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
769348e6b8a5c701734afff391b1c77de35222c6 
  
src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
22db80eaf34fe736fa5a3a9289836c9ac9e59906 
  src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
  src/main/thrift/org/apache/aurora/gen/storage.thrift 
5350ec945fbe028ee4641683815a068ce00b5efc 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
39729b374fe4e383f9b5ada7d016923766df9af7 
  src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
7a8c3b882633376a1bf6a78616d55aaa7401d13f 
  
src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/26478/diff/


Testing
---

./gradlew -Pq build


Thanks,

Kevin Sweeney



Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-08 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review55943
---

Ship it!


Ship It!

- Zameer Manji


On Oct. 8, 2014, 7:39 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 8, 2014, 7:39 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney