> On Dec. 27, 2016, 4:19 p.m., Stephan Erb wrote:
> > api/src/main/thrift/org/apache/aurora/gen/storage.thrift, lines 156-157
> > <https://reviews.apache.org/r/54883/diff/2/?file=1591086#file1591086line156>
> >
> >     Is it somehow necessary to use a serialized position here? I have the 
> > gut feeling that a plain long would probably be better as it reduces 
> > coupling to replicated log interna.

Actually, this way reduces coupling to the Mesos log. After all, it's an 
implementation detail of the Mesos replicated log that it uses longs for 
ordering. In other implementations of distributed logs you might need a tuple 
as the id - e.g. with BookKeeper, you have a ledger id and sequence id.

Whoever wrote the Mesos code (identity is the only public method to get at the 
underlying value) and our log code (Position is an empty interface) did a good 
job with the abstractions there.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54883/#review160165
-----------------------------------------------------------


On Dec. 22, 2016, 9:26 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54883/
> -----------------------------------------------------------
> 
> (Updated Dec. 22, 2016, 9:26 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, John Sirois, Santhosh Kumar 
> Shanmugham, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> See here for more details: 
> https://docs.google.com/document/d/1QVSEfeoCyt2D6cCmTCxy8-epufcuzIfnqRUkyT1betY/edit?usp=sharing
> 
> The motivation for this patch is the realisation that if we store Snapshots 
> outside of the main transaction log, we only ever to need to hold the storage 
> lock during snapshot creation. This would reduce the time writes are blocked 
> during snapshots by 80% for a large production cluster. I'm also doing this 
> with a view to future work, which is described in the document above. 
> 
> As long as everything works fine, this change is as simple as adding two new 
> args to the scheduler JVM. But it's worth noting that once this change is 
> deployed and the first snapshot is written, you will no longer be able to 
> rollback without first restoring from a backup.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> 9e4213f13255a182df938bea44ca87fa03a25318 
>   examples/vagrant/aurorabuild.sh dbec54d3e677db8cb0f02849e5373e619629fc7c 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> e68a801017ae02e0ed581129e12a645ccc1e35d4 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 43cc5b4645bc26b0fc6b23726ad3292699048ded 
>   src/main/java/org/apache/aurora/scheduler/log/Log.java 
> dc77eb435e5f8fdce56727a2f679e8e1907e977c 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/DualLogModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java 
> 21855e184fe20dc339713978b32344b6950701ec 
>   
> src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 
> 6704a328a4023a178ed8f86ae4772cb04eb2fa8e 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
> cfa9c56c909bac2e885e33a9fe772cb41cbd9fa9 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
> 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
>  9733ffe74b107f336858657550156ddb1f1dd215 
>   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
> ea147c0ba6aaa6d113144be0a8330bd2f73d2453 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
> baf2647c54f1f9918139584e5151873a3853e83c 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 29a3b4a57925f31e59a49d4bfa630e724cadcb04 
>   src/test/java/org/apache/aurora/scheduler/log/mesos/MesosLogTest.java 
> f142f545799d64f9352b0ac6c51942eedf5e9ced 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
> 3f445595a81a5655c6c486791a9b55d8dc7f2f76 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
> 34c24aac9339e59c435925bd2357ce86998f5f02 
> 
> Diff: https://reviews.apache.org/r/54883/diff/
> 
> 
> Testing
> -------
> 
> I currently have only tested this on one node (wrote this code on a flight). 
> Will have to verify this in our test clusters before shipping.
> 
> 
> Example log output on my vagrant image:
> 
> I1222 07:18:36.471 [qtp1117266705-37, LoggingInterceptor] snapshot() 
> I1222 07:18:36.479 [qtp1117266705-37, LogStorage] Creating snapshot at 
> position: Position 6 
> I1222 07:18:36.485 [qtp1117266705-37, SnapshotStoreImpl] Saving dbsnapshot 
> I1222 07:18:36.591 [qtp1117266705-37, LogStorage] Write lock released. 
> I1222 07:18:36.591 [pool-12-thread-1, LogStorage] Persisting snapshot. 
> I1222 07:18:36.592 [pool-12-thread-1, 
> SnapshotDeduplicator$SnapshotDeduplicatorImpl] Starting deduplication of a 
> snapshot with 0 tasks. 
> I1222 07:18:36.595 [pool-12-thread-1, 
> SnapshotDeduplicator$SnapshotDeduplicatorImpl] Finished deduplicating 
> snapshot. Deduplication ratio: 0/0 = NaN%. 
> I1222 07:18:36.599218  4155 log.cpp:577] Attempting to append 6261 bytes to 
> the log
> I1222 07:18:36.599321  4155 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 5
> I1222 07:18:36.599573  4160 replica.cpp:537] Replica received write request 
> for position 5 from (11)@192.168.33.7:8083
> I1222 07:18:36.600693  4160 leveldb.cpp:341] Persisting action (6280 bytes) 
> to leveldb took 1.061504ms
> I1222 07:18:36.600733  4160 replica.cpp:712] Persisted action at 5
> I1222 07:18:36.600844  4160 replica.cpp:691] Replica received learned notice 
> for position 5 from @0.0.0.0:0
> I1222 07:18:36.601541  4160 leveldb.cpp:341] Persisting action (6282 bytes) 
> to leveldb took 619657ns
> I1222 07:18:36.601569  4160 replica.cpp:712] Persisted action at 5
> I1222 07:18:36.601582  4160 replica.cpp:697] Replica learned APPEND action at 
> position 5
> I1222 07:18:36.603044  4160 log.cpp:596] Attempting to truncate the log to 5
> I1222 07:18:36.603173  4160 coordinator.cpp:348] Coordinator attempting to 
> write TRUNCATE action at position 6
> I1222 07:18:36.603330  4160 replica.cpp:537] Replica received write request 
> for position 6 from (12)@192.168.33.7:8083
> I1222 07:18:36.603890  4160 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 502063ns
> I1222 07:18:36.603919  4160 replica.cpp:712] Persisted action at 6
> I1222 07:18:36.603991  4160 replica.cpp:691] Replica received learned notice 
> for position 6 from @0.0.0.0:0
> I1222 07:18:36.604279  4160 leveldb.cpp:341] Persisting action (18 bytes) to 
> leveldb took 217834ns
> I1222 07:18:36.604357  4160 leveldb.cpp:399] Deleting ~3 keys from leveldb 
> took 13629ns
> I1222 07:18:36.604372  4160 replica.cpp:712] Persisted action at 6
> I1222 07:18:36.604383  4160 replica.cpp:697] Replica learned TRUNCATE action 
> at position 6
> I1222 07:18:36.604581  4160 log.cpp:596] Attempting to truncate the log to 6
> I1222 07:18:36.604682  4160 coordinator.cpp:348] Coordinator attempting to 
> write TRUNCATE action at position 7
> I1222 07:18:36.604778  4160 replica.cpp:537] Replica received write request 
> for position 7 from (13)@192.168.33.7:8083
> I1222 07:18:36.605317  4160 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 485544ns
> I1222 07:18:36.605356  4160 replica.cpp:712] Persisted action at 7
> I1222 07:18:36.605439  4160 replica.cpp:691] Replica received learned notice 
> for position 7 from @0.0.0.0:0
> I1222 07:18:36.605736  4160 leveldb.cpp:341] Persisting action (18 bytes) to 
> leveldb took 215503ns
> I1222 07:18:36.605773  4160 leveldb.cpp:399] Deleting ~3 keys from leveldb 
> took 9620ns
> I1222 07:18:36.605787  4160 replica.cpp:712] Persisted action at 7
> I1222 07:18:36.605798  4160 replica.cpp:697] Replica learned TRUNCATE action 
> at position 7
> I1222 07:18:36.605 [pool-12-thread-1, LogStorage] Snapshot complete. host 
> attrs: 1, cron jobs: 0, locks: 0, quota confs: 0, tasks: 0, updates: 0
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Reply via email to