Thanks! Looks like my specific issues are well on the radar.

On 22.10.2020 20:40 Howard Chu wrote:
[email protected] wrote:
This post outlines a few changes to LMDB I had to do to make it work in a 
specific use case. I’d like to see those changes upstream, but I understand 
that they may be/are not relevant for e.g. OpenLDAP.
The use case is multiple databases on disks with long running large write 
transactions.

1. Option to not use custom memory allocator/page pool

LMDB has a custom malloc() implementation that re-uses pages (me_dpages). I 
understand that this improves the performance at bit (depending on the malloc 
implementation). But there should at least be the option to not do that (for 
many reasons). I would even make not using it the default.
Not going to happen. But maybe it would be reasonable to allow configuring a 
limit on how many pages it keeps hanging around,
before actually using libc free() on them.
A limit would work.
2. Large transactions and spilling

In a large write transaction, it will use a lot of memory per default (512MiB) 
which won’t get freed when the transaction commits (see 1.). If one has a lot 
of databases it uses a lot of memory that never gets freed.

Alternatively, one can use MDB_WRITEMAP, but (i) per default Linux isn’t tuned 
to delay writing pages to disk and (ii) before commit LMDB has to remove a 
dirty bit, so each page is written twice.
There is no more dirty bit in LMDB 1.0, and this double-write no longer happens.
That'd improve the MDB_WRITEMAP case. The general problem there is that tuning of the write-back behaviour is system wide (e.g. vm.dirty_expire_centisecs on Linux) and mostly not in control of the application, so if one wants to be sure that writeback happens only once sufficient changes have accumulated, one needs to not use MDB_WRITEMAP. In that case it would be great to be able to control the memory usage/write-back behaviour by configuring the amount at which it spills, as well.

3. LMDB causes crashes if database is corrupted
You can enable per-page checksums in LMDB 1.0, in which case you'll just get an 
error code
if a page is corrupted (and the checksum fails to match). The DB will still be 
unusable if
anything is corrupted.
That would fix the problem properly. Does it check that it is the correct transaction as well (e.g. by putting a transid into the page like btrfs)? Returning wrong results or MDB_CORRUPTED is something my application can handle (but not crashes obviously).

Reply via email to