On 08/08/2016 05:41 AM, Hallvard Breien Furuseth wrote: > A transaction must not reuse data pages visible in the last snapshot > known to be durable, since that's how far back LDMB may need to revert > after abnormal termination. Like a crash after MDB_NOMETASYNC may do. > > Sync the data pages from a txn, write the metapage, eventually sync > that metapage, wait out any older read-only transactions, and *then* > you can reuse the pages the txn freed. Not before. So when you don't > sync, or a read-only txn won't die, LMDB degenerates to append-only. > > ...except if you sync the metapage and exit, next LMDB run may not > know you synced it and must assume the metapage isn't yet durable. > So it might not reuse pages visible to the _previous_ durable > metapage, until it syncs. I'm rather losing track at this point, > but I think it may mean twice as may not-yet-usable pages as one > might expect.
Concretely: say the current write transaction is number 10, and a long-lived reader is on number 7. Currently, MDB will be unable to reuse any pages used in transactions 7+ until the reader ends. Now say a 3rd, durable root is added. For the sake of argument, no checksums are used and in the event of a crash, only the last durable state is recovered. Say the durable transaction is number 2. Pages used in transaction 2 need to be preserved, obviously. 7+ still need to be preserved for the slow reader. But pages from transactions 3-6 can be reused. Note that the last durable transaction is controlled purely by the single writer, so tracking it is actually easier than tracking which readers are where. If a crash happens before a durable root is fully synced, then there should be a second, older durable root that hasn't been reused yet. In that case MDB recovers the way it does currently. Does this make sense? Thanks for bearing with me.
