Re: [LMDB] Lockups with robust mutexes and crashing processes

2014-11-25 Thread Marcos-David Dione
"openldap-devel"  wrote on 25/11/2014

16:19:40:
> Fyi, this would not have been a bug in Solaris:
> 
> https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-
> setrobust-np-3c/index.html

you mean, because of the phrase «When the owner of a mutex with 
the PTHREAD_MUTEX_ROBUST_NP robustness attribute dies, or when the process 
containing such a locked mutex unmaps the memory containing the mutex or 
performs one of the exec(2) functions, the mutex is unlocked.»? I'm not 
sure that when it says 'unmap' it means 'munmap()s'.

--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
marcos-david.di...@amadeus.com

Re: [LMDB] Lockups with robust mutexes and crashing processes

2014-11-25 Thread Howard Chu

Marcos-David Dione wrote:

"openldap-devel"  wrote on
25/11/2014 16:19:40:
 > Fyi, this would not have been a bug in Solaris:
 >
 > https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-
 > setrobust-np-3c/index.html

 you mean, because of the phrase «When the owner of a mutex with
the PTHREAD_MUTEX_ROBUST_NP /robustness/ attribute dies, or when the
process containing such a locked mutex unmaps the memory containing the
mutex or performs one of the _exec(2)_
functions,
the mutex is unlocked.»? I'm not sure that when it says 'unmap' it means
'munmap()s'.


There is no other meaning of the word. A process-shared mutex must 
reside in shared memory. When a process detaches from that shared 
memory, it is unmapped.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: [LMDB] Lockups with robust mutexes and crashing processes

2014-11-25 Thread Howard Chu

Marcos-David Dione wrote:

Marcos-David Dione/NCE/AMADEUS wrote on 24/11/2014 11:10:42:
 > Seen like that I'm not sure if there's a defined behaviour
 > for that. I'll ask in the glibc and/or kernel MLs and I'll come
 > back with the answer.

 and here's the answer:

 > On 11/24/2014 03:34 PM, Marcos Dione wrote:
 > > We found a situation where a robust mutex cannot be recovered
 > > from a stale lock and we're wondering if it's simply an undefined
 > > situation or  a bug in the kernel. Attached you will find the sample
 > > code, which is loosely based on a glibc's test case.The gist of it
is as
 > > follows:
 > >
 > > 1. we open a file.
 > > 2. we mmap it and use that mem to store a robust mutex.
 > > 3. we lock the mutex.
 > > 4. we munmap the file.
 > > 5. we close the file.
 >
 > Undefined behaviour.
 >
 > This results in undefined behaviour since the allocated storage for
 > the mutex object has been lost. You need to keep that storage around
 > for the robust algorithms to work with. Without any data you can't
 > do anything.

 Full answer:

https://sourceware.org/ml/libc-help/2014-11/msg00035.html


Fyi, this would not have been a bug in Solaris:

https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-setrobust-np-3c/index.html

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: [LMDB] Lockups with robust mutexes and crashing processes

2014-11-25 Thread Marcos-David Dione
Marcos-David Dione/NCE/AMADEUS wrote on 24/11/2014 11:10:42:
> Seen like that I'm not sure if there's a defined behaviour 
> for that. I'll ask in the glibc and/or kernel MLs and I'll come 
> back with the answer. 

and here's the answer:

> On 11/24/2014 03:34 PM, Marcos Dione wrote:
> > We found a situation where a robust mutex cannot be recovered
> > from a stale lock and we're wondering if it's simply an undefined
> > situation or  a bug in the kernel. Attached you will find the sample
> > code, which is loosely based on a glibc's test case.The gist of it is 
as
> > follows:
> > 
> > 1. we open a file.
> > 2. we mmap it and use that mem to store a robust mutex.
> > 3. we lock the mutex.
> > 4. we munmap the file.
> > 5. we close the file.
> 
> Undefined behaviour.
> 
> This results in undefined behaviour since the allocated storage for
> the mutex object has been lost. You need to keep that storage around
> for the robust algorithms to work with. Without any data you can't
> do anything.

Full answer:

https://sourceware.org/ml/libc-help/2014-11/msg00035.html

--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
marcos-david.di...@amadeus.com

Re: [LMDB] Lockups with robust mutexes and crashing processes

2014-11-25 Thread Marcos-David Dione
Howard Chu  wrote on 23/11/2014 15:11:29:
> env_close does an munmap of the memory containing the mutex. According 
> to the manpages, a robust mutex is supposed to automatically unlock when 

> unmapped. Since this is not happening, it appears you've found a kernel 
> bug. Regardless, the example is invalid. If you modify the code to just 
> exit/abort/die without the bogus call to env_close, the other process 
> wakes up correctly. E.g.
> 
> http://pastebin.com/9jieDnUz

Ok, so now I managed to mimic the situation with pure pthreads 
functions. 
The gist of it is as following:

1. we open a file.
2. we mmap the file and use the mem space to store a mutex.
3. we lock the mutex.
4. we unmmap the file.
5. we close the file.

Seen like that I'm not sure if there's a defined behaviour for 
that. I'll 
ask in the glibc and/or kernel MLs and I'll come back with the answer.

Just for the record, here's the full program:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void *tf (int n, int f, pthread_mutex_t *m) {
int err = pthread_mutex_lock (m);
printf ("ml: %d\n", err);
if (err == EOWNERDEAD) {
err = pthread_mutex_consistent_np (m);
printf ("mc: %d\n", err);
if (err) {
puts ("pthread_mutex_consistent_np");
exit (1);
}
} else if (err) {
puts ("pthread_mutex_lock");
exit (1);
}
printf ("%ld got the lock.\n", n);
sleep (3);
/* exit without unlock */
munmap (m, sizeof (pthread_mutex_t));
close (f);
printf ("%ld out\n", n);
return NULL;
}

int main (void) {
int err, f;
pthread_mutex_t *m;
pthread_mutexattr_t ma;

pthread_mutexattr_init (&ma);
err = pthread_mutexattr_setrobust_np (&ma, PTHREAD_MUTEX_ROBUST_NP);
if (err) {
puts ("pthread_mutexattr_setrobust_np");
return 1;
}
err = pthread_mutexattr_setpshared (&ma, PTHREAD_PROCESS_SHARED);
if (err) {
puts ("pthread_mutexattr_setpshared");
return 1;
}
#ifdef ENABLE_PI
if (pthread_mutexattr_setprotocol (&ma, PTHREAD_PRIO_INHERIT) != 0) {
puts ("pthread_mutexattr_setprotocol failed");
return 1;
}
#endif

f= open ("mutex.mmap", O_CREAT|O_TRUNC|O_RDWR);
if (f<0) {
puts ("open");
return 1;
}

err= ftruncate (f, sizeof (pthread_mutex_t));
if (err) {
puts ("ftruncate");
return 1;
}

m= (pthread_mutex_t *) mmap (NULL, sizeof (pthread_mutex_t),
 PROT_READ|PROT_WRITE, MAP_SHARED, f, 0);

err = pthread_mutex_init (m, &ma);
#ifdef ENABLE_PI
if (err == ENOTSUP) {
puts ("PI robust mutexes not supported");
return 0;
}
#endif
if (err) {
puts ("pthread_mutex_init");
return 1;
}

err= fork ();
if (err==0) {
tf (1, f, m);
} else if (err>0) {
int err2= fork ();
if (err2==0) {
tf (2, f, m);
} else if (err2>0) {
// sleep (1);
// kill (err);
// printf ("child killed\n");
// sleep (10);
puts ("main out");
} else {
puts ("fork2");
return 1;
}
} else {
puts ("fork1");
return 1;
}
return 0;
}

Re: [LMDB] Lockups with robust mutexes and crashing processes

2014-11-23 Thread Howard Chu

Marcos-David Dione wrote:

 I already posted this to the IRC channel, but there was no
response, so I repost this here.


... already followed up in IRC.


 I'm trying out lmdb from master, including the robust mutex
code. We're experiencing lock ups after the process holding the lock
dies, as if the robust lock was not recovered. I tried to come up with
an lmdb example that shows it and I got it, just a few lines. It uses
fork() just to automate it; see that the environment is opened in both
children. Here's the code:

http://pastebin.com/Cbbri6az


The example is broken; it does not mimic the behavior of a crashed 
process. In particular it does a clean call to mdb_env_close() but 
doesn't call mdb_txn_abort() first. An actual crashing process would not 
make the call to mdb_env_close(), and a cleanly exiting process would 
close all outstanding transactions before calling env_close.


 If I run this, I see that one of the children waits for the
write lock and is not awakened when the other child dies without closing
the txn (but notice I close the env). This is on purpose, to simulate a
crashing process.The worst part is that I can't reproduce it using
directly libpthread and mmap. Here is the code I came up with:

http://pastebin.com/ybR5L4cP

 It's a little bit more verbose because I based it on a glibc
test case.

 Are we missing anything? It seems to us that the code follows
does not break any of LMDB's caveats (specially the one about creating
the envs before fork()'ing. Is it wrong to assume that the waiting
process should recover the lock from staleness?


env_close does an munmap of the memory containing the mutex. According 
to the manpages, a robust mutex is supposed to automatically unlock when 
unmapped. Since this is not happening, it appears you've found a kernel 
bug. Regardless, the example is invalid. If you modify the code to just 
exit/abort/die without the bogus call to env_close, the other process 
wakes up correctly. E.g.


http://pastebin.com/9jieDnUz



--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
marcos-david.di...@amadeus.com



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/