LMDB set range to return less than or equal key

Victor Baybekov Sat, 21 Feb 2015 15:38:57 -0800

Hi!

In my use case I have found that MDB_SET_RANGE and MDB_GET_BOTH_RANGE
queries are much more useful, in 90+% cases, when they return less than or
equal key, not greater or equal.


I have tested a quick implementation of "get less or equal", and it is
c.50% slower in memory. I must say that if someone told me, before I
learned about LMDB, that even with 50% less than original performance such
numbers are possible, I would not believe - both are millions ops per
seconds in my isolated in-memory microbenchmark. But still, such queries
are building blocks of N operations that could require non-linear (N^x, x >
1) number of queries and I am interested if I could easily speed this up
closer to the built-in MDB_SET_RANGE. The C code is in this gist
<https://gist.github.com/buybackoff/59dabc70fa5ec8351bbe#file-mdb_cursor_get_le>.
I call it from C# via P/Invoke, the test code is here
<https://gist.github.com/buybackoff/59dabc70fa5ec8351bbe#file-c-test-via-pinvoke>.
I made sure that only C methods are different, all other code is the same
in C#. But even with the P/Invoke overhead the difference is visible.

My naive approach is to use MDB_SET_RANGE as the first step and, if the key
from this query is not equal to the requested key, to call MOVE_PREV. Other
than an additional call to MOVE_PREV, it requires allocation of a copy of
the original key, because MDB_SET_RANGE overwrites it and I haven't found a
better way to compare the requested key with the key returned after
MDB_SET_RANGE. My experience with C (without # or ++) is 4 days less than 2
month, LMDB was the major reason I started, so my code could be really
stupid...

If it is not, are there other options? E.g. to recompile LMDB for "less or
equal" as default behavior. I have tried to change this code inside
`mdb_node_search`:

if (rc *>* 0) { /* Found entry is *less than* the key. */
i*++*; /* Skip to get the *smallest entry larger than* key. */
if (!IS_LEAF2(mp))
node = NODEPTR(mp, i);
}


to this code:

if (rc *<* 0) { /* Found entry is greater than the key. */
i*--*; /* Skip to get the *largest entry smaller than* key. */
if (!IS_LEAF2(mp))
node = NODEPTR(mp, i);
}


but it fails badly. Is "greater or equal" approach deeply rooted in the
design of LMDB, or I have missed just a small number of other places and it
is feasible to change the behavior globally? If that is possible, I would
like to do so and to use my current approach for "greater or equal" instead.

I cannot test this quickly on a larger-that-RAM data set (because I
struggle with my 128Gb laptop drive). But as far as I understand, in most
cases MDB_SET_RANGE will load the page with a previous key in RAM - usually
this is the same page or one that has been touched during the search? So in
the on-disk scenario the incremental MOVE_PREV will be negligible and I
should not worry about that?

Is there another, completely different and better, way to perform "less
than or equal" query?

Thanks!
Victor

LMDB set range to return less than or equal key

Reply via email to