Re: (ITS#8475) Feature request: MDB low durability transactions

bentrask Mon, 08 Aug 2016 09:10:55 -0700

On 08/08/2016 05:41 AM, Hallvard Breien Furuseth wrote:
> A transaction must not reuse data pages visible in the last snapshot
> known to be durable, since that's how far back LDMB may need to revert
> after abnormal termination.  Like a crash after MDB_NOMETASYNC may do.
>
> Sync the data pages from a txn, write the metapage, eventually sync
> that metapage, wait out any older read-only transactions, and *then*
> you can reuse the pages the txn freed.  Not before.  So when you don't
> sync, or a read-only txn won't die, LMDB degenerates to append-only.
>
> ...except if you sync the metapage and exit, next LMDB run may not
> know you synced it and must assume the metapage isn't yet durable.
> So it might not reuse pages visible to the _previous_ durable
> metapage, until it syncs.  I'm rather losing track at this point,
> but I think it may mean twice as may not-yet-usable pages as one
> might expect.


Concretely: say the current write transaction is number 10, and a 
long-lived reader is on number 7. Currently, MDB will be unable to reuse 
any pages used in transactions 7+ until the reader ends.

Now say a 3rd, durable root is added. For the sake of argument, no 
checksums are used and in the event of a crash, only the last durable 
state is recovered. Say the durable transaction is number 2. Pages used 
in transaction 2 need to be preserved, obviously. 7+ still need to be 
preserved for the slow reader. But pages from transactions 3-6 can be 
reused.

Note that the last durable transaction is controlled purely by the 
single writer, so tracking it is actually easier than tracking which 
readers are where.

If a crash happens before a durable root is fully synced, then there 
should be a second, older durable root that hasn't been reused yet. In 
that case MDB recovers the way it does currently.

Does this make sense? Thanks for bearing with me.

Re: (ITS#8475) Feature request: MDB low durability transactions

Reply via email to