Re: LMDB killed process and LOCK_MUTEX_W()

Howard Chu Wed, 16 Jul 2014 06:17:42 -0700

Dimitrios Apostolou wrote:

Hello,


in my program using LMDB, I've experienced rare deadlocks in highly
concurrent mixed (read/write/cursor iteration) workloads. The end result
is that hundreds of threads are hanging waiting on LOCK_MUTEX_W().
Unfortunately I'm not quite sure why this happens.

If my understanding is correct, this mutex is locked from the beginning of
the transaction, until the commit/abort, effectively serialising writers.
So I assume that somehow a writer dies or is violently killed, so he is
not able to run its atexit() cleanups, and this shared mutex remains
locked forever.

What would you suggest for such a situation? I'm thinking of patching LMDB
to lock with mutex_timedwait() and periodically check if the PID having
taken the mutex is still alive. Is the writer PID stored somewhere, or a
change of format will be needed? Any other ideas are welcome!

We have a patch to use robust mutexes. They're a few percent slower but willallow recovery from this situation.

But aside from that, either your software has a bug, or someone is messingwith your system, and you need to find out what's going on and stop that.


Thanks in advance,
Dimitris



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: LMDB killed process and LOCK_MUTEX_W()

Reply via email to