> On Aug 4, 2017, at 11:28 AM, Nico Williams <n...@cryptonector.com> wrote:
> Imagine a mode where there is only a WAL, and to checkpoint is to write
> a new WAL with only live contents and... rename(2) into place.
What you’re describing is exactly how CouchDB’s storage engine works, as well
as descendants like Couchbase Server’s CouchStore and ForestDB. (Note: I work
Efficient lookups in a file like this require the existence of a bunch of
extraneous metadata like interior B-tree nodes. This metadata changes all the
time as records are written*, so a lot of it has to be written out too along
with every transaction, resulting in substantial write amplification.
The other big drawback is that compaction (the checkpoint step you describe) is
very expensive in terms of I/O. I’ve known of CouchDB systems that took many
hours to compact their databases, and since every write that occurs during a
compaction has to be replayed onto the new file after the copy before
compaction completes, one can get into a state where a busy database either
never actually finishes compacting, or has to temporarily block all writers
just so it can get the damn job done without interruption. (It’s a similar
problem to GC thrash.)
We’ve also seen that, on low-end hardware like mobile devices, I/O bandwidth is
limited enough that a running compaction can really harm the responsiveness of
the _entire OS_, as well as cause significant battery drain.
* Modifying/rewriting a single record requires rewriting the leaf node that
points to it, which requires rewriting the parent node that points to the leaf,
and this ripples all the way up to the root node.
sqlite-users mailing list