Thanks! Looks like my specific issues are well on the radar.
On 22.10.2020 20:40 Howard Chu wrote:
[email protected] wrote:
This post outlines a few changes to LMDB I had to do to make it work in a
specific use case. I’d like to see those changes upstream, but I understand
that they may be/are not relevant for e.g. OpenLDAP.
The use case is multiple databases on disks with long running large write
transactions.
1. Option to not use custom memory allocator/page pool
LMDB has a custom malloc() implementation that re-uses pages (me_dpages). I
understand that this improves the performance at bit (depending on the malloc
implementation). But there should at least be the option to not do that (for
many reasons). I would even make not using it the default.
Not going to happen. But maybe it would be reasonable to allow configuring a
limit on how many pages it keeps hanging around,
before actually using libc free() on them.
A limit would work.
2. Large transactions and spilling
In a large write transaction, it will use a lot of memory per default (512MiB)
which won’t get freed when the transaction commits (see 1.). If one has a lot
of databases it uses a lot of memory that never gets freed.
Alternatively, one can use MDB_WRITEMAP, but (i) per default Linux isn’t tuned
to delay writing pages to disk and (ii) before commit LMDB has to remove a
dirty bit, so each page is written twice.
There is no more dirty bit in LMDB 1.0, and this double-write no longer happens.
That'd improve the MDB_WRITEMAP case. The general problem there is that
tuning of the write-back behaviour is system wide (e.g.
vm.dirty_expire_centisecs on Linux) and mostly not in control of the
application, so if one wants to be sure that writeback happens only once
sufficient changes have accumulated, one needs to not use MDB_WRITEMAP.
In that case it would be great to be able to control the memory
usage/write-back behaviour by configuring the amount at which it spills,
as well.
3. LMDB causes crashes if database is corrupted
You can enable per-page checksums in LMDB 1.0, in which case you'll just get an
error code
if a page is corrupted (and the checksum fails to match). The DB will still be
unusable if
anything is corrupted.
That would fix the problem properly. Does it check that it is the
correct transaction as well (e.g. by putting a transid into the page
like btrfs)? Returning wrong results or MDB_CORRUPTED is something my
application can handle (but not crashes obviously).