Re: Question regarding MDB_NOLOCK

2015-01-30 Thread Howard Chu

David Barbour wrote:

Probably. Again, I really don't see 32-bit systems as being worthy
of consideration.



That's understandable. There are other embedded databases suitable for
32-bit systems.


If anyone ever runs a 32 bit server fast enough and long enough to 
process 4 billion write transactions, they can always do an mdb_copy to 
reset the txnIDs.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Question regarding MDB_NOLOCK

2015-01-30 Thread David Barbour
On Sat, Jan 31, 2015 at 12:50 AM, Howard Chu  wrote:

>
> Probably. Again, I really don't see 32-bit systems as being worthy of
> consideration.


That's understandable. There are other embedded databases suitable for
32-bit systems.


Re: Question regarding MDB_NOLOCK

2015-01-30 Thread Howard Chu

David Barbour wrote:


Okay, I think I see what's happening here:

mdb_find_oldest(): will return the most recent snapshot if no readers exist

mdb_page_alloc(): will search FREE_DBI for a transaction `last` that is
less than oldest, and will try to find a contiguous range of pages that
were free'd by said transaction, potentially merging free pages from
many transactions. If nothing is found, will instead grow the database.

Since `last` < `oldest` when we reuse any old pages, and we're only
using the 'freed' pages from last (not the data pages), we know that at
the data pages for the eldest two transactions are protected.

Is this right?


Yep.


My earlier assumption (before reading mdb_page_alloc) was that LMDB
would be aggressive about grabbing pages freed by transactions that are
not actively being read. If we're relying on `last < oldest` to create a
two page discrepancy, this means when we actually have readers on older
transactions that we're being little more conservative than necessary.


More than necessary? I don't think so.


But it does protect the last two snapshots.


Yes, always.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Question regarding MDB_NOLOCK

2015-01-30 Thread Howard Chu

David Barbour wrote:

Idle question: what happens with freeing old pages when the txnid_t
wraps around on a 32-bit system? Do the pages free'd by those
transactions just get stuck?


Probably. Again, I really don't see 32-bit systems as being worthy of 
consideration.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Question regarding MDB_NOLOCK

2015-01-30 Thread David Barbour
Okay, I think I see what's happening here:

mdb_find_oldest(): will return the most recent snapshot if no readers exist

mdb_page_alloc(): will search FREE_DBI for a transaction `last` that is
less than oldest, and will try to find a contiguous range of pages that
were free'd by said transaction, potentially merging free pages from many
transactions. If nothing is found, will instead grow the database.

Since `last` < `oldest` when we reuse any old pages, and we're only using
the 'freed' pages from last (not the data pages), we know that at the data
pages for the eldest two transactions are protected.

Is this right?

My earlier assumption (before reading mdb_page_alloc) was that LMDB would
be aggressive about grabbing pages freed by transactions that are not
actively being read. If we're relying on `last < oldest` to create a two
page discrepancy, this means when we actually have readers on older
transactions that we're being little more conservative than necessary. But
it does protect the last two snapshots.

...

Idle question: what happens with freeing old pages when the txnid_t wraps
around on a 32-bit system? Do the pages free'd by those transactions just
get stuck?


Re: Question regarding MDB_NOLOCK

2015-01-30 Thread Hallvard Breien Furuseth

On 30/01/15 22:57, David Barbour wrote:

On Mon, Dec 1, 2014 at 6:55 AM, Hallvard Breien Furuseth
mailto:h.b.furus...@usit.uio.no>> wrote:
(...)
Last snapshot is never overwritten.  So readers which did begin/renew
after latest commit(write txn) are safe from txn_begin(write txn).

The same with the commit of the write txn before that.  I think.
MDB keeps the last two snapshots in the metapages.

> I've been reading the MDB source code a bit more to verify this
assumption. It is not valid.

MDB does keep the last two metapages, but may begin to dismantle the
elder of the two for pages if there are no readers for it.


Yes, true.  The last two snapshots' *data pages* are never
overwritten, and any readers using them will have read the
metapage and do not need it again.

I confused myself because I was thinking of sync issues:
Even if the oldest metapage has been overwritten, that
does not mean it is gone yet: If it has not been synced
to disk, a system crash can bring it back.  And with it,
its refs to datapages.

Looking closer though, that's only relevant if with
MDB_NOSYNC, where the previous metapage has not been
synced either.




Re: Question regarding MDB_NOLOCK

2015-01-30 Thread David Barbour
On Fri, Jan 30, 2015 at 6:21 PM, Howard Chu  wrote:

>
> That is supposed to be its current behavior already. I.e., no page that
> either of the two meta pages points to is ever allowed to be reclaimed.
>
>
Okay then. On Monday, I'll see if I can write a test demonstrating this
bug. And, if so, a patch to fix it.


Re: Question regarding MDB_NOLOCK

2015-01-30 Thread Howard Chu

David Barbour wrote:



On Fri, Jan 30, 2015 at 3:57 PM, David Barbour mailto:dmbarb...@gmail.com>> wrote:


For my current use case, I believe that I can still achieve a
sufficient level of parallelism even if limited to double-buffering
(whereas two snapshots would give me triple-buffering). I'm not
going to press for any changes at this time.


After having examined this further, I've changed my mind.

With triple buffering, I can guarantee that the writer *almost* never
waits on a short-running reader, and that the readers never wait on the
writer. With double buffering, the probability of the writer waiting on
even short-running readers, assuming they are frequent, is nearly 100%.
Triple buffering is thus a huge advantage for users of MDB_NOLOCK.

The update to support this is almost trivial: tweak `mdb_find_oldest`
such that both meta-page snapshots are considered to have active
readers. I'm willing to develop and submit a patch, but only if this
change also sounds good to the main LMDB developers.


That is supposed to be its current behavior already. I.e., no page that 
either of the two meta pages points to is ever allowed to be reclaimed.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Question regarding MDB_NOLOCK

2015-01-30 Thread David Barbour
On Fri, Jan 30, 2015 at 3:57 PM, David Barbour  wrote:

>
> For my current use case, I believe that I can still achieve a sufficient
> level of parallelism even if limited to double-buffering (whereas two
> snapshots would give me triple-buffering). I'm not going to press for any
> changes at this time.
>
>
After having examined this further, I've changed my mind.

With triple buffering, I can guarantee that the writer *almost* never waits
on a short-running reader, and that the readers never wait on the writer.
With double buffering, the probability of the writer waiting on even
short-running readers, assuming they are frequent, is nearly 100%. Triple
buffering is thus a huge advantage for users of MDB_NOLOCK.

The update to support this is almost trivial: tweak `mdb_find_oldest` such
that both meta-page snapshots are considered to have active readers. I'm
willing to develop and submit a patch, but only if this change also sounds
good to the main LMDB developers.

Regards,

Dave


Re: Question regarding MDB_NOLOCK

2015-01-30 Thread David Barbour
(This is a quick follow-up to an earlier discussion, a note left for anyone
confused by MDB_NOLOCK.)

On Mon, Dec 1, 2014 at 6:55 AM, Hallvard Breien Furuseth <
h.b.furus...@usit.uio.no> wrote:

>
> A write transaction frees pages which its new snapshot cannot see.
> A later writer will overwrite them, when no *known* readers can see
> them either.  But with MDB_NOLOCK, writers don't know about old
> readers and might overwrite pages which old readers can see.
>
> Last snapshot is never overwritten.  So readers which did begin/renew
> after latest commit(write txn) are safe from txn_begin(write txn).
>
> The same with the commit of the write txn before that.  I think.
> MDB keeps the last two snapshots in the metapages.


I've been reading the MDB source code a bit more to verify this assumption.
It is not valid.

MDB does keep the last two metapages, but may begin to dismantle the elder
of the two for pages if there are no readers for it. With MDB_NOLOCK, it
simply assumes there are no readers for it (cf. mdb_find_oldest). **Only
the most recent snapshot is preserved.**

For my current use case, I believe that I can still achieve a sufficient
level of parallelism even if limited to double-buffering (whereas two
snapshots would give me triple-buffering). I'm not going to press for any
changes at this time.