> On Aug 4, 2017, at 11:28 AM, Nico Williams <n...@cryptonector.com> wrote:
> Imagine a mode where there is only a WAL, and to checkpoint is to write
> a new WAL with only live contents and... rename(2) into place.

What you’re describing is exactly how CouchDB’s storage engine works, as well 
as descendants like Couchbase Server’s CouchStore and ForestDB. (Note: I work 
for Couchbase.)

Efficient lookups in a file like this require the existence of a bunch of 
extraneous metadata like interior B-tree nodes. This metadata changes all the 
time as records are written*, so a lot of it has to be written out too along 
with every transaction, resulting in substantial write amplification.

The other big drawback is that compaction (the checkpoint step you describe) is 
very expensive in terms of I/O. I’ve known of CouchDB systems that took many 
hours to compact their databases, and since every write that occurs during a 
compaction has to be replayed onto the new file after the copy before 
compaction completes, one can get into a state where a busy database either 
never actually finishes compacting, or has to temporarily block all writers 
just so it can get the damn job done without interruption. (It’s a similar 
problem to GC thrash.)

We’ve also seen that, on low-end hardware like mobile devices, I/O bandwidth is 
limited enough that a running compaction can really harm the responsiveness of 
the _entire OS_, as well as cause significant battery drain.


* Modifying/rewriting a single record requires rewriting the leaf node that 
points to it, which requires rewriting the parent node that points to the leaf, 
and this ripples all the way up to the root node.
sqlite-users mailing list

Reply via email to