On 27.10.2020 18:52 Howard Chu wrote:
3. LMDB causes crashes if database is corrupted
You can enable per-page checksums in LMDB 1.0, in which case you'll just get an
error code
if a page is corrupted (and the checksum fails to match). The DB will still be
unusable if
anything is corrupted.
That would fix the problem properly. Does it check that it is the correct
transaction as well (e.g. by putting a transid into the page like btrfs)?
Returning
wrong results or MDB_CORRUPTED is something my application can handle (but not
crashes obviously).
The txnid is part of the page header, which is one of the incompatible format
changes from LMDB 0.9.
This is what allows us to eliminate the dirty bit.
Not sure what you mean about the txnid being correct or not, but certainly it
is included in the
checksum.
A common problem that e.g. btrfs users encounter is that a disk drops
some writes. If there was a page at the same location previously the
checksum check succeeds. But btrfs stores the transid of the page in the
page's parent, so it compares that as well (The error message is "btrfs
parent transid verify failed on OFFSET wanted TRANSID found TRANSID"). I
think ZFS stores the checksum of the page in the page's parent as well
(idk if this would work with lmdbs b-tree).
I guess a simple check (that might already exist) is checking if page
transid<=root/meta page transid. But that doesn't catch the cases where
the root page was updated, but updates to other pages were dropped (the
disk might also drop a complete "transaction" but not report an error,
in which case the next transaction then writes the root page pointing to
an incompletely written tree).