Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-10-27 Thread Carlos O'Donell
On Mon, Oct 27, 2008 at 11:27 AM, Andrew Haley <[EMAIL PROTECTED]> wrote:
> I understand all that, but the question still stands: is the compiler
> really moving a memory write past a memory barrier?  ISTR we did have
> a discussion on gcc-list about that, but it was a while ago and should
> now be fixed.

This issue no longer affects the PA port, but I can't speak for s390.

The PA port is the only port for which I do regular gcc / glibc testing.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-10-27 Thread Andrew Haley
Carlos O'Donell wrote:
> On Mon, Oct 27, 2008 at 10:05 AM, Andrew Haley <[EMAIL PROTECTED]> wrote:
>>> I've seen this on-and-off again on the hppa-linux port. The issue has,
>>> in my experience, been a compiler problem. My standard operating
>>> procedure is to methodically add volatile to the atomic.h operations
>>> until it goes away, and then work out the compiler mis-optimization.
>>>
>>> The bug is almost always a situation where the lll_unlock is scheduled
>>> before owner = 0, and the assert catches the race condition where you
>>> unlock but have not yet cleared the owner.
>> Are you sure this is a compiler problem?  Unless you use explicit atomic
>> memory accesses or volatile the compiler is supposed to re-order memory
>> access.  Perhaps I'm misunderstanding you.
> 
> Sorry, parsing the above statement requires knowing something about
> how lll_unlock is implemented in glibc.
> 
> The lll_unlock function is supposed to be a memory barrier.
> 
> The function is usually an explicit atomic operation, or a volatile
> asm implementing the futex syscall i.e. INTERNAL_SYSCALL macro.

I understand all that, but the question still stands: is the compiler
really moving a memory write past a memory barrier?  ISTR we did have
a discussion on gcc-list about that, but it was a while ago and should
now be fixed.

Andrew.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-10-27 Thread Carlos O'Donell
On Mon, Oct 27, 2008 at 10:05 AM, Andrew Haley <[EMAIL PROTECTED]> wrote:
>> I've seen this on-and-off again on the hppa-linux port. The issue has,
>> in my experience, been a compiler problem. My standard operating
>> procedure is to methodically add volatile to the atomic.h operations
>> until it goes away, and then work out the compiler mis-optimization.
>>
>> The bug is almost always a situation where the lll_unlock is scheduled
>> before owner = 0, and the assert catches the race condition where you
>> unlock but have not yet cleared the owner.
>
> Are you sure this is a compiler problem?  Unless you use explicit atomic
> memory accesses or volatile the compiler is supposed to re-order memory
> access.  Perhaps I'm misunderstanding you.

Sorry, parsing the above statement requires knowing something about
how lll_unlock is implemented in glibc.

The lll_unlock function is supposed to be a memory barrier.

The function is usually an explicit atomic operation, or a volatile
asm implementing the futex syscall i.e. INTERNAL_SYSCALL macro.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-10-27 Thread Andrew Haley
Carlos O'Donell wrote:
> On Sat, Oct 25, 2008 at 1:21 PM, Julien Danjou <[EMAIL PROTECTED]> wrote:
>> Is there anything from an outsider that could help?
> 
> I've seen this on-and-off again on the hppa-linux port. The issue has,
> in my experience, been a compiler problem. My standard operating
> procedure is to methodically add volatile to the atomic.h operations
> until it goes away, and then work out the compiler mis-optimization.
> 
> The bug is almost always a situation where the lll_unlock is scheduled
> before owner = 0, and the assert catches the race condition where you
> unlock but have not yet cleared the owner.

Are you sure this is a compiler problem?  Unless you use explicit atomic
memory accesses or volatile the compiler is supposed to re-order memory
access.  Perhaps I'm misunderstanding you.

Andrew.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-10-27 Thread Carlos O'Donell
On Sat, Oct 25, 2008 at 1:21 PM, Julien Danjou <[EMAIL PROTECTED]> wrote:
> Is there anything from an outsider that could help?

I've seen this on-and-off again on the hppa-linux port. The issue has,
in my experience, been a compiler problem. My standard operating
procedure is to methodically add volatile to the atomic.h operations
until it goes away, and then work out the compiler mis-optimization.

The bug is almost always a situation where the lll_unlock is scheduled
before owner = 0, and the assert catches the race condition where you
unlock but have not yet cleared the owner.

$0.02.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-10-25 Thread Julien Danjou
At 1210458182 time_t, Aurelien Jarno wrote:
> Looking quickly at the code the problem is that LLL_MUTEX_LOCK (mutex)
> fails to acquire the mutex. It can be a bug in atomic.h or a bug in the
> futexes implementation of the kernel.
> 
> It would be nice to have an strace of the problem to see the futex
> syscall before this assertion.

Here's what I can get from #468793.
In this test, if the number of thread is <= 2, it's ok.
With something like ./tchmttest typical casket 3 1000 1000 it fails 50 %
of the time.

I've tried to strace the test but unfortunately when stracing,
everything is fine.

Is there anything from an outsider that could help?

Cheers,
-- 
Julien Danjou
.''`.  Debian Developer
: :' : http://julien.danjou.info
`. `'  http://people.debian.org/~acid
  `-   9A0D 5FD9 EB42 22F6 8974  C95C A462 B51E C2FE E5CD


signature.asc
Description: Digital signature


Re: Bug#479952: libc6/s390 - __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

2008-05-10 Thread Aurelien Jarno
On Wed, May 07, 2008 at 11:29:49AM +0200, Bastian Blank wrote:
> Package: libc6
> Version: 2.7-10
> Severity: important
> 
> On Wed, May 07, 2008 at 09:34:12AM +0200, Matthias Klose wrote:
> > the build failure on s390 is unexpected; is it possible to extract a
> > test case?
> 
> | java: pthread_mutex_lock.c:71: __pthread_mutex_lock: Assertion 
> `mutex->__data.__owner == 0' failed.
> 
> So another package failed about that (after mono and libto$bla). It
> looks like a race condition somewhere in the libpthread.
> 

Looking quickly at the code the problem is that LLL_MUTEX_LOCK (mutex)
fails to acquire the mutex. It can be a bug in atomic.h or a bug in the
futexes implementation of the kernel.

It would be nice to have an strace of the problem to see the futex
syscall before this assertion.

Also a small testcase of the problem would be really helpful to debug
it.

-- 
  .''`.  Aurelien Jarno | GPG: 1024D/F1BCDB73
 : :' :  Debian developer   | Electrical Engineer
 `. `'   [EMAIL PROTECTED] | [EMAIL PROTECTED]
   `-people.debian.org/~aurel32 | www.aurel32.net


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]