[Issue 10095] New: Race condition causing corruption of mutexes when closing the database

openldap-its Fri, 25 Aug 2023 12:44:40 -0700

https://bugs.openldap.org/show_bug.cgi?id=10095

Issue ID: 10095
Summary: Race condition causing corruption of mutexes when
closing the database
Product: LMDB
Version: 0.9.30
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---

We're running into a race condition across multiple processes causing the
corruption of mutexes when a process closes the database caused by the fix for
https://bugs.openldap.org/show_bug.cgi?id=9278 (commit
https://git.openldap.org/openldap/openldap/-/commit/f683ffdc81d0edb20437cb7d655cf15a60e31249).

Here's the interleaving of two processes (p0 and p1) that can cause this
situation.

p0: Opens connection to database using mdb_env_create and mdb_env_open.

...some things happen in between...

p0: Begins closing the database using mdb_env_close:
p0: Calls mdb_env_close0:
p0: Acquires write lock on the file lock using mdb_env_excl_lock.
p0: Calls pthread_mutex_destroy on the mutexes.

SWITCH TO p1

p1: Begins opening the database using mdb_env_create. Then calls mdb_env_open,
in mdb_env_open:
p1: Calls mdb_env_setup_locks:
p1: Calls mdb_env_excl_lock, but it's unable to acquire a write file lock
due to p0 holding the write file lock. It waits on acquiring a read file lock.

SWITCH TO p0

p0: Calls close on the file descriptor which releases the write lock.

SWITCH TO p1

p1: Acquires the read file lock.
p1: Does NOT call pthread_mutex_init since it did not acquire a write file
lock.

...some things happen in between...

p1: Try to lock the mutex using pthread_mutex_lock. This call fails with a
EINVAL due to locking a destroyed mutex.

I'm not sure how to actually solve this problem. We're currently mitigating
this problem by reverting the commit linked above (so no mutexes get
destroyed).

--
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] New: Race condition causing corruption of mutexes when closing the database

Reply via email to