https://bugs.openldap.org/show_bug.cgi?id=9291

          Issue ID: 9291
           Summary: Detection of corrupted database files
           Product: LMDB
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: ---
         Component: liblmdb
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Let's assume we have to deal with a corrupted database for whatever reason
(e.g. broken hardware or file system). Current behavior seems to be mostly
undefined, which is understandable as it's not known what is broken (e.g. there
are no checksums).

For example, I'm seeing a SIGBUS in mdb_page_touch because the cursor's top
page (mp) is pointing to invalid memory (0x7f99cf004000) during a commit:
mdb_page_touch mdb.c:2772
mdb_page_search mdb.c:6595
mdb_freelist_save mdb.c:3575
mdb_txn_commit mdb.c:4060

Cursor data at that point: mc_snum = 1, mc_top = 0; myki[0] = 0

A SIGBUS is troublesome as it crashes the process, and I wonder if there are
other ways to detect such inconsistencies. If that be possible there could be
user-specific handling in place. E.g. a user might start a new database file.

This issue was reported by our users, which also provided DB files: 
https://github.com/objectbox/objectbox-java/issues/859

I did not find a lot of consistency checks besides MDB_PAGE_NOTFOUND and
MDB_CORRUPTED. Also, I think there's no current way to thoroughly check a DB
file (e.g. like fsck for the DB file)?

My first idea other than checksums was to walk through the branch pages from
the root and check if the referenced pages are within reasonable bounds. Also
check the page content (e.g. nodes, flags). Additionally (optionally?), it
should be possible to check that the key values are actually sorted.

So, it boils down to 3 points in summary:
1.) If there no way to check the DB file for consistency yet(?), which approach
do you think would make sense? There might be two modes; one for a through
check through all data, and a quick check that does not take long and could be
e.g. done when opening the DB. Goal is to avoid process crashes and let users
handle the situation.
2.) In general, is it possible to add more consistency checks in regular DB
operations?
3.) Could the the particular situation (for which I provided the stack trace)
detected (e.g. is myki[0] = 0 legal here?)

I'd be happy to provide a patch if you provide some direction where you want to
take that.

-- 
You are receiving this mail because:
You are on the CC list for the issue.

Reply via email to