I wonder if an intermediate variation on this idea might be easier to
implement:

We can describe a subset of MDB_RDONLY transactions as 'long running', e.g.
when copying the database. Normal read-only transactions might then be
called 'ephemeral'. We might compute long-running transactions
heuristically, e.g. ((currentTxnID - rdOnlyTxnId) > 100) and/or we might
allow developers to mark them.

When deciding which transactions to free, we compute three numbers with a
one-pass scan of the reader lock table:

txnO = the oldest active transaction
txnE = the oldest active ephemeral transaction
txnL = the newest active long-running transaction

Normally, LMDB just computes txnO, and conservatively assumes that readers
might be operating anywhere between txnO and the current transaction.
Emmanuel's proposal for precise analysis is somewhat daunting. But I think
we can use this simple three-point analysis to find some sweet spots
between precise and conservative.

We know by definition that txnO <= txnL, and txnO <= txnE.

Under normal conditions, i.e. assuming that long-running transactions are
relatively infrequent, we also know that txnL < txnE will be true more
frequently than not.  Conservatively, we must protect the ranges from
[txnO,txnL] and [txnE,currentTxnId].  But in the common case, there will be
a non-empty range of transactions (txnL,txnE) that can be GC'd. And of
course anything below txnO can still be GC'd.

I'm not sure whether or not we can easily identify which pages are used
only within that gap, and may thus be collected. I haven't studied that
aspect of LMDB yet. But supposing that aspect is addressed, this scheme
should greatly mitigate the impact of infrequent long-running read-only
transactions. A lot more space from recent write transactions would be
reusable.

Reply via email to