> On Dec. 27, 2016, 7:42 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java, line 
> > 529
> > <https://reviews.apache.org/r/54883/diff/2/?file=1591095#file1591095line529>
> >
> >     I'm not following this sentence, and i can't connect it to the 
> > limitation of the truncate implementation above.  Can you take a stab at 
> > rephrasing?  Seems important to convey correctly.

So now I'm not sure if the description is bad or I'm worrying about nothing. 

FWIW, to explain the problem in more verbose terms:

If we have a sequence of 
Ops(https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/storage.thrift#L95)
 and we have a log like:

    OP 4
    OP 3
    OP 2
    OP 1
    
Then after the whole snapshot process is complete (assuming no further entries 
during asynchronous snapshotting), we'd have two logs like:

    OP 4        SNAPSHOT (POSITION = 4)
    

And then when replaying, we'll apply the snapshot and then apply operation 4 
again. If we can guarantee applying the same op twice in a row would always 
result in the same storage state as applying it once, then appending a noop is 
not needed.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54883/#review160184
-----------------------------------------------------------


On Dec. 22, 2016, 9:26 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54883/
> -----------------------------------------------------------
> 
> (Updated Dec. 22, 2016, 9:26 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, John Sirois, Santhosh Kumar 
> Shanmugham, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> See here for more details: 
> https://docs.google.com/document/d/1QVSEfeoCyt2D6cCmTCxy8-epufcuzIfnqRUkyT1betY/edit?usp=sharing
> 
> The motivation for this patch is the realisation that if we store Snapshots 
> outside of the main transaction log, we only ever to need to hold the storage 
> lock during snapshot creation. This would reduce the time writes are blocked 
> during snapshots by 80% for a large production cluster. I'm also doing this 
> with a view to future work, which is described in the document above. 
> 
> As long as everything works fine, this change is as simple as adding two new 
> args to the scheduler JVM. But it's worth noting that once this change is 
> deployed and the first snapshot is written, you will no longer be able to 
> rollback without first restoring from a backup.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> 9e4213f13255a182df938bea44ca87fa03a25318 
>   examples/vagrant/aurorabuild.sh dbec54d3e677db8cb0f02849e5373e619629fc7c 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> e68a801017ae02e0ed581129e12a645ccc1e35d4 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 43cc5b4645bc26b0fc6b23726ad3292699048ded 
>   src/main/java/org/apache/aurora/scheduler/log/Log.java 
> dc77eb435e5f8fdce56727a2f679e8e1907e977c 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/DualLogModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java 
> 21855e184fe20dc339713978b32344b6950701ec 
>   
> src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 
> 6704a328a4023a178ed8f86ae4772cb04eb2fa8e 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
> cfa9c56c909bac2e885e33a9fe772cb41cbd9fa9 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
> 387350c7667a5fb8ee674ad0d3dd17529232b25b 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
>  9733ffe74b107f336858657550156ddb1f1dd215 
>   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
> ea147c0ba6aaa6d113144be0a8330bd2f73d2453 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
> baf2647c54f1f9918139584e5151873a3853e83c 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 29a3b4a57925f31e59a49d4bfa630e724cadcb04 
>   src/test/java/org/apache/aurora/scheduler/log/mesos/MesosLogTest.java 
> f142f545799d64f9352b0ac6c51942eedf5e9ced 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
> 3f445595a81a5655c6c486791a9b55d8dc7f2f76 
>   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
> 34c24aac9339e59c435925bd2357ce86998f5f02 
> 
> Diff: https://reviews.apache.org/r/54883/diff/
> 
> 
> Testing
> -------
> 
> I currently have only tested this on one node (wrote this code on a flight). 
> Will have to verify this in our test clusters before shipping.
> 
> 
> Example log output on my vagrant image:
> 
> I1222 07:18:36.471 [qtp1117266705-37, LoggingInterceptor] snapshot() 
> I1222 07:18:36.479 [qtp1117266705-37, LogStorage] Creating snapshot at 
> position: Position 6 
> I1222 07:18:36.485 [qtp1117266705-37, SnapshotStoreImpl] Saving dbsnapshot 
> I1222 07:18:36.591 [qtp1117266705-37, LogStorage] Write lock released. 
> I1222 07:18:36.591 [pool-12-thread-1, LogStorage] Persisting snapshot. 
> I1222 07:18:36.592 [pool-12-thread-1, 
> SnapshotDeduplicator$SnapshotDeduplicatorImpl] Starting deduplication of a 
> snapshot with 0 tasks. 
> I1222 07:18:36.595 [pool-12-thread-1, 
> SnapshotDeduplicator$SnapshotDeduplicatorImpl] Finished deduplicating 
> snapshot. Deduplication ratio: 0/0 = NaN%. 
> I1222 07:18:36.599218  4155 log.cpp:577] Attempting to append 6261 bytes to 
> the log
> I1222 07:18:36.599321  4155 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 5
> I1222 07:18:36.599573  4160 replica.cpp:537] Replica received write request 
> for position 5 from (11)@192.168.33.7:8083
> I1222 07:18:36.600693  4160 leveldb.cpp:341] Persisting action (6280 bytes) 
> to leveldb took 1.061504ms
> I1222 07:18:36.600733  4160 replica.cpp:712] Persisted action at 5
> I1222 07:18:36.600844  4160 replica.cpp:691] Replica received learned notice 
> for position 5 from @0.0.0.0:0
> I1222 07:18:36.601541  4160 leveldb.cpp:341] Persisting action (6282 bytes) 
> to leveldb took 619657ns
> I1222 07:18:36.601569  4160 replica.cpp:712] Persisted action at 5
> I1222 07:18:36.601582  4160 replica.cpp:697] Replica learned APPEND action at 
> position 5
> I1222 07:18:36.603044  4160 log.cpp:596] Attempting to truncate the log to 5
> I1222 07:18:36.603173  4160 coordinator.cpp:348] Coordinator attempting to 
> write TRUNCATE action at position 6
> I1222 07:18:36.603330  4160 replica.cpp:537] Replica received write request 
> for position 6 from (12)@192.168.33.7:8083
> I1222 07:18:36.603890  4160 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 502063ns
> I1222 07:18:36.603919  4160 replica.cpp:712] Persisted action at 6
> I1222 07:18:36.603991  4160 replica.cpp:691] Replica received learned notice 
> for position 6 from @0.0.0.0:0
> I1222 07:18:36.604279  4160 leveldb.cpp:341] Persisting action (18 bytes) to 
> leveldb took 217834ns
> I1222 07:18:36.604357  4160 leveldb.cpp:399] Deleting ~3 keys from leveldb 
> took 13629ns
> I1222 07:18:36.604372  4160 replica.cpp:712] Persisted action at 6
> I1222 07:18:36.604383  4160 replica.cpp:697] Replica learned TRUNCATE action 
> at position 6
> I1222 07:18:36.604581  4160 log.cpp:596] Attempting to truncate the log to 6
> I1222 07:18:36.604682  4160 coordinator.cpp:348] Coordinator attempting to 
> write TRUNCATE action at position 7
> I1222 07:18:36.604778  4160 replica.cpp:537] Replica received write request 
> for position 7 from (13)@192.168.33.7:8083
> I1222 07:18:36.605317  4160 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 485544ns
> I1222 07:18:36.605356  4160 replica.cpp:712] Persisted action at 7
> I1222 07:18:36.605439  4160 replica.cpp:691] Replica received learned notice 
> for position 7 from @0.0.0.0:0
> I1222 07:18:36.605736  4160 leveldb.cpp:341] Persisting action (18 bytes) to 
> leveldb took 215503ns
> I1222 07:18:36.605773  4160 leveldb.cpp:399] Deleting ~3 keys from leveldb 
> took 9620ns
> I1222 07:18:36.605787  4160 replica.cpp:712] Persisted action at 7
> I1222 07:18:36.605798  4160 replica.cpp:697] Replica learned TRUNCATE action 
> at position 7
> I1222 07:18:36.605 [pool-12-thread-1, LogStorage] Snapshot complete. host 
> attrs: 1, cron jobs: 0, locks: 0, quota confs: 0, tasks: 0, updates: 0
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Reply via email to