Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-05 Thread Aurelien Jarno
On Fri, Jun 04, 2010 at 11:25:28AM +0200, Aurelien Jarno wrote:
 Aurelien Jarno a écrit :
  On Thu, Jun 03, 2010 at 10:09:45PM +0300, Rémi Denis-Courmont wrote:
  Le jeudi 3 juin 2010 22:00:13 Aurelien Jarno, vous avez écrit :
  I have found a machine with almost the same CPU, the only difference
  being the speed (3.00 GHz instead of 2.80 GHz). I am unable to reproduce
  the problem, I have run the testcase more than 20 times over last
  night.
  With SMT (HyperThread) support?
  
  Yes, with HyperThreading enabled.
  
  Maybe the problem is actually not in the GNU libc. What kernel are you
  running?
  Normally, I use upstream 2.6.32.15 at the moment.
  But I also hit the bug with Debian 2.6.32-5-686.
 
  
  I tried on a 2.6.26 kernel, I'll try to reproduce it with this kernel.
  
 
 I tried on a 2.6.32-5-686 kernel, and it hasn't failed in more than
 30 loops. There is probably something different on your system
 causing the issue.
 

I have modified a bit the testcase so that it runs in a loop, and I
removed all timing functions (see attached file). I am able to reproduce
the problem in some conditions:
- It fails between 20 and 3 millions of iterations on
  dual-core i386 CPU in lenny, squeeze and sid.
- It never fails on HT CPU (tried P4 and Atom)
- It never fails when pinned on a single CPU using taskset
- It never fails on amd64
- It fails in lenny, testing and unstable
- It seems to fail more quickly in a KVM instance (probably more
  timing variation).

This seems to confirm there is a race condition, but very difficult to
reproduce. My guess is that a P4 CPU running at 2.8 GHz with HT enabled
has the perfect timing to reproduce the bug.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-05 Thread Aurelien Jarno
On Sat, Jun 05, 2010 at 06:39:24PM +0200, Aurelien Jarno wrote:
 On Fri, Jun 04, 2010 at 11:25:28AM +0200, Aurelien Jarno wrote:
  Aurelien Jarno a écrit :
   On Thu, Jun 03, 2010 at 10:09:45PM +0300, Rémi Denis-Courmont wrote:
   Le jeudi 3 juin 2010 22:00:13 Aurelien Jarno, vous avez écrit :
   I have found a machine with almost the same CPU, the only difference
   being the speed (3.00 GHz instead of 2.80 GHz). I am unable to reproduce
   the problem, I have run the testcase more than 20 times over last
   night.
   With SMT (HyperThread) support?
   
   Yes, with HyperThreading enabled.
   
   Maybe the problem is actually not in the GNU libc. What kernel are you
   running?
   Normally, I use upstream 2.6.32.15 at the moment.
   But I also hit the bug with Debian 2.6.32-5-686.
  
   
   I tried on a 2.6.26 kernel, I'll try to reproduce it with this kernel.
   
  
  I tried on a 2.6.32-5-686 kernel, and it hasn't failed in more than
  30 loops. There is probably something different on your system
  causing the issue.
  
 
 I have modified a bit the testcase so that it runs in a loop, and I
 removed all timing functions (see attached file). I am able to reproduce
 the problem in some conditions:

This time it is.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
/* gcc -O2 -Wall -lpthread condfail.c */
#define _GNU_SOURCE 1
#undef NDEBUG
#include pthread.h
#include time.h
#include assert.h
#include stdio.h

static pthread_cond_t wait = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
static long long int i=0;

static void cleanup_lock(void *lock)
{
	int val;

	i++;
	val = pthread_mutex_unlock(lock);
	if (val != 0) {
		printf(failed after %lli iterations\n, i);
	assert (0);
	}
}

static void *entry(void *barrier)
{
	pthread_mutex_lock(lock);
	pthread_cleanup_push(cleanup_lock, lock);
	pthread_barrier_wait(barrier);
	for (;;)
		pthread_cond_wait(wait, lock);
	pthread_cleanup_pop(0);
	assert(0);
}

int main (void)
{
	for(;;) {

		pthread_t th;
		pthread_barrier_t barrier;

		pthread_barrier_init(barrier, NULL, 2);
		pthread_create(th, NULL, entry, barrier);
		pthread_barrier_wait(barrier);
		pthread_barrier_destroy(barrier);

		pthread_cancel(th);
		pthread_mutex_lock(lock);
		pthread_mutex_unlock(lock);
		pthread_join(th, NULL);
	}
	return 0;
}


Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-04 Thread Aurelien Jarno
Aurelien Jarno a écrit :
 On Thu, Jun 03, 2010 at 10:09:45PM +0300, Rémi Denis-Courmont wrote:
 Le jeudi 3 juin 2010 22:00:13 Aurelien Jarno, vous avez écrit :
 I have found a machine with almost the same CPU, the only difference
 being the speed (3.00 GHz instead of 2.80 GHz). I am unable to reproduce
 the problem, I have run the testcase more than 20 times over last
 night.
 With SMT (HyperThread) support?
 
 Yes, with HyperThreading enabled.
 
 Maybe the problem is actually not in the GNU libc. What kernel are you
 running?
 Normally, I use upstream 2.6.32.15 at the moment.
 But I also hit the bug with Debian 2.6.32-5-686.

 
 I tried on a 2.6.26 kernel, I'll try to reproduce it with this kernel.
 

I tried on a 2.6.32-5-686 kernel, and it hasn't failed in more than
30 loops. There is probably something different on your system
causing the issue.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-03 Thread Rémi Denis-Courmont
Le jeudi 3 juin 2010 00:32:14 Aurelien Jarno, vous avez écrit :
 Does it mean it's a lot more difficult to reproduce it with this
 version?

Today the test case failed 3 out of 3 times already.

My VLC debug builds started triggering pthread_mutex_unlock() errors
pseudo-randomly again. I did not observe this behaviour since you had
 presumably fixed the bug. Not a single occurence in those many months.

 Have you tried to run so many iterations with the version
 built with gcc-4.3?

The test case, not that I remember. VLC debug builds, yes.

% cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 3
model name  : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping: 4
cpu MHz : 2800.000
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 1
apicid  : 0
initial apicid  : 0
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse 
sse2 ss ht tm pbe constant_tsc pebs bts pni dtes64 monitor ds_cpl cid xtpr
bogomips: 5585.95
clflush size: 64
cache_alignment : 128
address sizes   : 36 bits physical, 32 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 3
model name  : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping: 4
cpu MHz : 2800.000
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 1
apicid  : 1
initial apicid  : 1
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse 
sse2 ss ht tm pbe constant_tsc pebs bts pni dtes64 monitor ds_cpl cid xtpr
bogomips: 5586.01
clflush size: 64
cache_alignment : 128
address sizes   : 36 bits physical, 32 bits virtual
power management:


-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-03 Thread Aurelien Jarno
On Thu, Jun 03, 2010 at 08:59:22PM +0300, Rémi Denis-Courmont wrote:
 Le jeudi 3 juin 2010 00:32:14 Aurelien Jarno, vous avez écrit :
  Does it mean it's a lot more difficult to reproduce it with this
  version?
 
 Today the test case failed 3 out of 3 times already.
 
 My VLC debug builds started triggering pthread_mutex_unlock() errors
 pseudo-randomly again. I did not observe this behaviour since you had
  presumably fixed the bug. Not a single occurence in those many months.
 
  Have you tried to run so many iterations with the version
  built with gcc-4.3?
 
 The test case, not that I remember. VLC debug builds, yes.
 
 % cat /proc/cpuinfo
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 15
 model   : 3
 model name  : Intel(R) Pentium(R) 4 CPU 2.80GHz
 stepping: 4

I have found a machine with almost the same CPU, the only difference
being the speed (3.00 GHz instead of 2.80 GHz). I am unable to reproduce
the problem, I have run the testcase more than 20 times over last
night.

Maybe the problem is actually not in the GNU libc. What kernel are you
running?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-03 Thread Rémi Denis-Courmont
Le jeudi 3 juin 2010 22:00:13 Aurelien Jarno, vous avez écrit :
 I have found a machine with almost the same CPU, the only difference
 being the speed (3.00 GHz instead of 2.80 GHz). I am unable to reproduce
 the problem, I have run the testcase more than 20 times over last
 night.

With SMT (HyperThread) support?

 Maybe the problem is actually not in the GNU libc. What kernel are you
 running?

Normally, I use upstream 2.6.32.15 at the moment.
But I also hit the bug with Debian 2.6.32-5-686.

-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-03 Thread Aurelien Jarno
On Thu, Jun 03, 2010 at 10:09:45PM +0300, Rémi Denis-Courmont wrote:
 Le jeudi 3 juin 2010 22:00:13 Aurelien Jarno, vous avez écrit :
  I have found a machine with almost the same CPU, the only difference
  being the speed (3.00 GHz instead of 2.80 GHz). I am unable to reproduce
  the problem, I have run the testcase more than 20 times over last
  night.
 
 With SMT (HyperThread) support?

Yes, with HyperThreading enabled.

  Maybe the problem is actually not in the GNU libc. What kernel are you
  running?
 
 Normally, I use upstream 2.6.32.15 at the moment.
 But I also hit the bug with Debian 2.6.32-5-686.
 

I tried on a 2.6.26 kernel, I'll try to reproduce it with this kernel.

If I fail, would it be possible to get a limited access to this machine
to debug the issue?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-02 Thread Rémi Denis-Courmont
Le mardi 1 juin 2010 20:20:01 Aurelien Jarno, vous avez écrit :
 I am therefore reopening this bug as it may still be present, though we
 now have a different version and a different compiler. As I am unable
 to reproduce the original problem, I am unable to test this new version.
 Could you please test if version 2.11.1-2 is affected or not?

It is. I hit the failure case after 8074 consecutive iterations.

-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-02 Thread Aurelien Jarno
On Wed, Jun 02, 2010 at 11:28:50PM +0300, Rémi Denis-Courmont wrote:
 Le mardi 1 juin 2010 20:20:01 Aurelien Jarno, vous avez écrit :
  I am therefore reopening this bug as it may still be present, though we
  now have a different version and a different compiler. As I am unable
  to reproduce the original problem, I am unable to test this new version.
  Could you please test if version 2.11.1-2 is affected or not?
 
 It is. I hit the failure case after 8074 consecutive iterations.
 

Ok, it's really bad, especially as I don't have a way to debug it...

gcc-4.4 miscompiles something in this bug, and gcc-4.3 miscompiles
else in bug#583858...

Could you please give me more details about your CPU (cat
/proc/cpuinfo), so that I can try to find a machine with the same CPU?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-02 Thread Aurelien Jarno
On Wed, Jun 02, 2010 at 11:28:50PM +0300, Rémi Denis-Courmont wrote:
 Le mardi 1 juin 2010 20:20:01 Aurelien Jarno, vous avez écrit :
  I am therefore reopening this bug as it may still be present, though we
  now have a different version and a different compiler. As I am unable
  to reproduce the original problem, I am unable to test this new version.
  Could you please test if version 2.11.1-2 is affected or not?
 
 It is. I hit the failure case after 8074 consecutive iterations.
 

In the original bug report, you said:

 I don't know. It reproduces pretty much 100% here:
 
 % ./a.out
 1
 2
 a.out: test.c:18: cleanup_lock: Assertion `val == 0' failed.
 Abandon

Does it mean it's a lot more difficult to reproduce it with this 
version? Have you tried to run so many iterations with the version
built with gcc-4.3?


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Processed: Re: Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2010-06-01 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

 unarchive 551903
Bug #551903 {Done: Aurelien Jarno aure...@debian.org} [libc6-i686] libc6-i686 
pthread_cond_wait fails to reacquire mutex upon cancellation
Unarchived Bug 551903
 found 551903 2.11.1-2
Bug #551903 {Done: Aurelien Jarno aure...@debian.org} [libc6-i686] libc6-i686 
pthread_cond_wait fails to reacquire mutex upon cancellation
Bug Marked as found in versions eglibc/2.11.1-2 and reopened.
 thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
551903: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=551903
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-26 Thread Aurelien Jarno
On Wed, Oct 21, 2009 at 10:40:03PM +0300, Rémi Denis-Courmont wrote:
 Le mercredi 21 octobre 2009 22:33:56, vous avez écrit :
  On Wed, Oct 21, 2009 at 07:11:40PM +0300, Remi Denis-Courmont wrote:
   Package: libc6-i686
   Version: 2.10.1-1
   Severity: critical
   Justification: breaks unrelated software
  
  
 Hello,
  
   With the upgrade to 2.10.1, pthread_cond_wait() fails to re-acquire the
   provided mutex when acting on a deferred cancellation event from
   another thread. This is seen if (and apparently, only if) another thread
   acquires the same mutex after cancellation is initiated, but before the
   cancelled thread executes cancellation cleanup handlers.
  
   I could not reproduce the problem with plain libc6. It only occurs with
   libc6-i686 installed.
  
   I wrote a simple test case at:
   http://www.remlab.net/files/divers/condfail.c
  
  This test shows the same behaviour on both lenny and sid version, that
  is it prints 1 and 2, but never triggers an assertion.
  
  Are there other conditions for this test to fail?
 
 I don't know. It reproduces pretty much 100% here:
 
 % ./a.out
 1
 2
 a.out: test.c:18: cleanup_lock: Assertion `val == 0' failed.
 Abandon
 
 I'm running on a single core SMT (P4/HT namely), so instruction cycle timing 
 might be very different from what an UP or non-SMT SMP gets :( In any case, 
 the fact that is only occurs with libc6-i686 hints at incorrect use of atomic 
 ops, I guess...
 

Problems related to atomic ops often comes, or at least are triggered
by, gcc changes. I have rebuilt eglibc 2.10.1-2 using gcc-4.3 instead of
gcc-4.4. The packages are available on http://temp.aurel32.net/eglibc/
Could you please tell me if you have the same problem with them?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-26 Thread Rémi Denis-Courmont
Le lundi 26 octobre 2009 19:09:46 Aurelien Jarno, vous avez écrit :
 Thanks for the test. It's the solution I'll use if I can't find the real
 problem. Looking at the recent upstream commits, the problem may be
 fixed by this commit:
 
 http://repo.or.cz/w/glibc.git?a=commit;h=e73e694e38b7b222eec3ec5897eb507d88
 bb8928
 
 As I can't reproduce the problem here, if I build packages with this
 patch, would it be possible for you to test them?

Yeah sure.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-26 Thread Aurelien Jarno
On Mon, Oct 26, 2009 at 07:17:49PM +0200, Rémi Denis-Courmont wrote:
 Le lundi 26 octobre 2009 19:09:46 Aurelien Jarno, vous avez écrit :
  Thanks for the test. It's the solution I'll use if I can't find the real
  problem. Looking at the recent upstream commits, the problem may be
  fixed by this commit:
  
  http://repo.or.cz/w/glibc.git?a=commit;h=e73e694e38b7b222eec3ec5897eb507d88
  bb8928
  
  As I can't reproduce the problem here, if I build packages with this
  patch, would it be possible for you to test them?
 
 Yeah sure.
 

Forget about it, we already have this patch in our tree :( I'll switch
back to gcc 4.3 instead.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-26 Thread Rémi Denis-Courmont
Le lundi 26 octobre 2009 10:10:45 Aurelien Jarno, vous avez écrit :
  I'm running on a single core SMT (P4/HT namely), so instruction cycle
  timing might be very different from what an UP or non-SMT SMP gets :( In
  any case, the fact that is only occurs with libc6-i686 hints at incorrect
  use of atomic ops, I guess...
 
 Problems related to atomic ops often comes, or at least are triggered
 by, gcc changes. I have rebuilt eglibc 2.10.1-2 using gcc-4.3 instead of
 gcc-4.4. The packages are available on http://temp.aurel32.net/eglibc/
 Could you please tell me if you have the same problem with them?

Good catch. I could not reproduce the problem with 2.10.1-2+gcc4.3, neither 
with the test case nor with VLC media player.

Thanks!

-- 
Rémi Denis-Courmont
http://www.remlab.net/



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-26 Thread Aurelien Jarno
On Mon, Oct 26, 2009 at 06:47:45PM +0200, Rémi Denis-Courmont wrote:
 Le lundi 26 octobre 2009 10:10:45 Aurelien Jarno, vous avez écrit :
   I'm running on a single core SMT (P4/HT namely), so instruction cycle
   timing might be very different from what an UP or non-SMT SMP gets :( In
   any case, the fact that is only occurs with libc6-i686 hints at incorrect
   use of atomic ops, I guess...
  
  Problems related to atomic ops often comes, or at least are triggered
  by, gcc changes. I have rebuilt eglibc 2.10.1-2 using gcc-4.3 instead of
  gcc-4.4. The packages are available on http://temp.aurel32.net/eglibc/
  Could you please tell me if you have the same problem with them?
 
 Good catch. I could not reproduce the problem with 2.10.1-2+gcc4.3, neither 
 with the test case nor with VLC media player.
 

Thanks for the test. It's the solution I'll use if I can't find the real
problem. Looking at the recent upstream commits, the problem may be
fixed by this commit:

http://repo.or.cz/w/glibc.git?a=commit;h=e73e694e38b7b222eec3ec5897eb507d88bb8928

As I can't reproduce the problem here, if I build packages with this 
patch, would it be possible for you to test them?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-21 Thread Remi Denis-Courmont
Package: libc6-i686
Version: 2.10.1-1
Severity: critical
Justification: breaks unrelated software


Hello,

With the upgrade to 2.10.1, pthread_cond_wait() fails to re-acquire the
provided mutex when acting on a deferred cancellation event from
another thread. This is seen if (and apparently, only if) another thread
acquires the same mutex after cancellation is initiated, but before the
cancelled thread executes cancellation cleanup handlers.

I could not reproduce the problem with plain libc6. It only occurs with
libc6-i686 installed.

I wrote a simple test case at:
http://www.remlab.net/files/divers/condfail.c

This is a violation of POSIX threads semantics, and a regression from
earlier libc6-i686. This also renders VLC media player debug versions
almost completely unusable.

Best regards,

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (100, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.30.9 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages libc6-i686 depends on:
ii  libc6 2.10.1-1   GNU C Library: Shared libraries

libc6-i686 recommends no packages.

libc6-i686 suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-21 Thread Aurelien Jarno
On Wed, Oct 21, 2009 at 07:11:40PM +0300, Remi Denis-Courmont wrote:
 Package: libc6-i686
 Version: 2.10.1-1
 Severity: critical
 Justification: breaks unrelated software
 
 
   Hello,
 
 With the upgrade to 2.10.1, pthread_cond_wait() fails to re-acquire the
 provided mutex when acting on a deferred cancellation event from
 another thread. This is seen if (and apparently, only if) another thread
 acquires the same mutex after cancellation is initiated, but before the
 cancelled thread executes cancellation cleanup handlers.
 
 I could not reproduce the problem with plain libc6. It only occurs with
 libc6-i686 installed.
 
 I wrote a simple test case at:
 http://www.remlab.net/files/divers/condfail.c
 

This test shows the same behaviour on both lenny and sid version, that
is it prints 1 and 2, but never triggers an assertion.

Are there other conditions for this test to fail?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-21 Thread Rémi Denis-Courmont
Le mercredi 21 octobre 2009 22:33:56, vous avez écrit :
 On Wed, Oct 21, 2009 at 07:11:40PM +0300, Remi Denis-Courmont wrote:
  Package: libc6-i686
  Version: 2.10.1-1
  Severity: critical
  Justification: breaks unrelated software
 
 
  Hello,
 
  With the upgrade to 2.10.1, pthread_cond_wait() fails to re-acquire the
  provided mutex when acting on a deferred cancellation event from
  another thread. This is seen if (and apparently, only if) another thread
  acquires the same mutex after cancellation is initiated, but before the
  cancelled thread executes cancellation cleanup handlers.
 
  I could not reproduce the problem with plain libc6. It only occurs with
  libc6-i686 installed.
 
  I wrote a simple test case at:
  http://www.remlab.net/files/divers/condfail.c
 
 This test shows the same behaviour on both lenny and sid version, that
 is it prints 1 and 2, but never triggers an assertion.
 
 Are there other conditions for this test to fail?

I don't know. It reproduces pretty much 100% here:

% ./a.out
1
2
a.out: test.c:18: cleanup_lock: Assertion `val == 0' failed.
Abandon

I'm running on a single core SMT (P4/HT namely), so instruction cycle timing 
might be very different from what an UP or non-SMT SMP gets :( In any case, 
the fact that is only occurs with libc6-i686 hints at incorrect use of atomic 
ops, I guess...

-- 
Rémi Denis-Courmont
http://www.remlab.net/



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#551903: libc6-i686 pthread_cond_wait fails to reacquire mutex upon cancellation

2009-10-21 Thread Rémi Denis-Courmont
Le mercredi 21 octobre 2009 22:40:03 Rémi Denis-Courmont, vous avez écrit :
 % ./a.out
 1
 2
 a.out: test.c:18: cleanup_lock: Assertion `val == 0' failed.
 Abandon

P.S.: For what it's worth val is EPERM here. That's why I assume the lock is 
not correctly re-acquired.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org