Re: Review Request 54774: Avoid double writing job updates to the Scheduler Snapshot

Joshua Cohen Thu, 15 Dec 2016 07:40:09 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54774/#review159305
-----------------------------------------------------------

I think there's some relevant context in the original review for this feature: 
https://reviews.apache.org/r/44471/. There was concern in that review with the 
removal of the native fields from snapshots for the reasons you mentioned in 
the associated ticket. I think that's still a concern, and there's the 
possibility that tooling exists outside of Aurora that depends on these fields 
existing in the snapshot.

I'm generally in favor of the changes you've outlined, as they should 
dramatically improve snapshot performance, however perhaps it makes sense to 
put this behind a command line flag? Maybe a single flag that allows for 
control over which snapshot fields are emitted in addition to the dbScript 
(this would require adding something like `getName` to `SnapshotField`). I'd 
also be fine with providing some tooling that converts the dbScript in a 
snapshot to a fully hydrated thrift snapshot. Either way, I don't think we 
should ship this without some accommodation for that use case. Maybe start with 
the flag (as it's much easier to add), and follow up with the additional 
tooling at a later date (at which point we could kill the flag and all of the 
`SnapshotField`s that are already encompassed by the `dbScript`.)

src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
(lines 315 - 317)
<https://reviews.apache.org/r/54774/#comment230279>

    Calling `hasDbSnapshot` has nothing to do with whether or not there's a 
memory store. It has to do with whether or not there's a `dbScript` in the 
snapshot. That's a relatively recent addition to snapshots. Prior to that we 
fetched everything from H2 and dumped it to the snapshot as individual thrift 
entries.

- Joshua Cohen

On Dec. 15, 2016, 7:57 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54774/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2016, 7:57 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Santhosh Kumar Shanmugham, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1861
>     https://issues.apache.org/jira/browse/AURORA-1861
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Motivation: Thanks to the mybatis query metrics we added, we found that the 
> redundant query I'm removing here currently adds 20s to the snapshot creation 
> time. This is an extra 20s unavailability every time we snapshot on our prod 
> clusters. 
> 
> ===
> 
> Avoid double writing job updates to the Scheduler Snapshot. This should be 
> low-risk because:
> 
> 1) Job Updates have *never* had an in-memory store implementation. 
> 2) They are also relatively new, so the chances of some old code processing 
> job updates from backups is fairly low?
> 
> 
> Diffs
> -----
> 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> d2c859055f905ac76ee8eb387dca103b9857ddbe 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 7a11850e217dcb0148e4a4d33542c95b2e53a726 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  cf0a8f3ea11e9c48d1f16441af54dc781b33bdfc 
> 
> Diff: https://reviews.apache.org/r/54774/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew test
> 
> I'll apply this to one of our test clusters before merging to master too.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Re: Review Request 54774: Avoid double writing job updates to the Scheduler Snapshot

Reply via email to