Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-05 Thread Dave Korn
Corinna Vinschen wrote:
 On Jun  4 02:52, Dave Korn wrote:
 Dave Korn wrote:
 Dave Korn wrote:
   The attached patch implements ilockexch and ilockcmpexch, using the 
 inline
 asm definition from __arch_compare_and_exchange_val_32_acq in
 glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline 
 rather
 than in its original preprocessor macro form.

   It generates incorrect code.
   This much looks like it's probably a compiler bug.  
   Let's see whether anyone else agrees:

 http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html
 
 When you checked in this change, I'll create a 1.7.0-49 test release.

  This is the final version I committed.  It is the glibc version, with the
addition of the memory clobber, which that discussion revealed was absolutely
required, and the use of a register asm var to feed the inline asm, which is
in accordance with documented practice in the gcc manual.  (This now leaves
only one difference between the glibc version and the version I posted, which
is the use of a +a write-only output constraint paired with a numeric 0
matching input constraint in glibc's version compared with a single output
operand using the read-write constraing =a in my version.  These should in
fact be exactly identical in terms of what they indicate to reload, in any 
case.)

  I have also manually inspected the generated assembly from thread.cc and
shared.cc in a cygwin DLL build and verified that it is correct and efficient,
and have installed the resulting DLL and retested all Thomas Stalder's
testcases and the previously intermittently failing pthread7-rope testcase
from libstdc++ testsuite.  Committed with this ChangeLog:

winsup/cygwin/ChangeLog

* winbase.h (ilockexch):  Fix asm constraints.
(ilockcmpexch):  Likewise.


  Libstdc++ plan after the weekend.  Cheers all!

cheers,
  DaveK


? winsup/cygwin/cygwin-cxx.h
? winsup/cygwin/mutex
Index: winsup/cygwin/winbase.h
===
RCS file: /cvs/src/src/winsup/cygwin/winbase.h,v
retrieving revision 1.14
diff -p -u -r1.14 winbase.h
--- winsup/cygwin/winbase.h	12 Jul 2008 18:09:17 -	1.14
+++ winsup/cygwin/winbase.h	5 Jun 2009 13:05:08 -
@@ -38,22 +38,31 @@ ilockdecr (volatile long *m)
 extern __inline__ long
 ilockexch (volatile long *t, long v)
 {
-  register int __res;
-  __asm__ __volatile__ (\n\
-1:	lock	cmpxchgl %3,(%1)\n\
-	jne 1b\n\
- 	: =a (__res), =q (t): 1 (t), q (v), 0 (*t): cc);
-  return __res;
+  return
+  ({
+register __typeof (*t) ret __asm (%eax);
+__asm __volatile (\n
+	1:	lock cmpxchgl %2, %1\n
+		jne  1b\n
+	: =a (ret), =m (*t)
+	: r (v), m (*t), 0 (*t)
+	: memory);
+ret;
+  });
 }
 
 extern __inline__ long
 ilockcmpexch (volatile long *t, long v, long c)
 {
-  register int __res;
-  __asm__ __volatile__ (\n\
-	lock cmpxchgl %3,(%1)\n\
-	: =a (__res), =q (t) : 1 (t), q (v), 0 (c): cc);
-  return __res;
+  return
+  ({
+register __typeof (*t) ret __asm (%eax);
+__asm __volatile (lock cmpxchgl %2, %1
+	: =a (ret), =m (*t)
+	: r (v), m (*t), 0 (c)
+	: memory);
+ret;
+  });
 }
 
 #undef InterlockedIncrement


Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-05 Thread Corinna Vinschen
On Jun  5 15:04, Dave Korn wrote:
 Corinna Vinschen wrote:
  On Jun  4 02:52, Dave Korn wrote:
  Dave Korn wrote:
  Dave Korn wrote:
The attached patch implements ilockexch and ilockcmpexch, using the 
  inline
  asm definition from __arch_compare_and_exchange_val_32_acq in
  glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline 
  rather
  than in its original preprocessor macro form.
 
It generates incorrect code.
This much looks like it's probably a compiler bug.  
Let's see whether anyone else agrees:
 
  http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html
  
  When you checked in this change, I'll create a 1.7.0-49 test release.
 
   This is the final version I committed.  It is the glibc version, with the
 addition of the memory clobber, which that discussion revealed was absolutely
 required, and the use of a register asm var to feed the inline asm, which is
 in accordance with documented practice in the gcc manual.  (This now leaves
 only one difference between the glibc version and the version I posted, which
 is the use of a +a write-only output constraint paired with a numeric 0
 matching input constraint in glibc's version compared with a single output
 operand using the read-write constraing =a in my version.  These should in
 fact be exactly identical in terms of what they indicate to reload, in any 
 case.)
 
   I have also manually inspected the generated assembly from thread.cc and
 shared.cc in a cygwin DLL build and verified that it is correct and efficient,
 and have installed the resulting DLL and retested all Thomas Stalder's
 testcases and the previously intermittently failing pthread7-rope testcase
 from libstdc++ testsuite.  Committed with this ChangeLog:

Cool, thank you.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat


Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-05 Thread Christopher Faylor
On Fri, Jun 05, 2009 at 03:04:59PM +0100, Dave Korn wrote:
winsup/cygwin/ChangeLog

   * winbase.h (ilockexch):  Fix asm constraints.
   (ilockcmpexch):  Likewise.

Thanks for seeing this through.  It was obviously a lot of work.

cgf


Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-05 Thread Dave Korn
Christopher Faylor wrote:
 On Fri, Jun 05, 2009 at 03:04:59PM +0100, Dave Korn wrote:
 winsup/cygwin/ChangeLog

  * winbase.h (ilockexch):  Fix asm constraints.
  (ilockcmpexch):  Likewise.
 
 Thanks for seeing this through.  It was obviously a lot of work.
 
 cgf

  I appreciate the need to be diligent when working so deep in the bowels of
the fundaments, so no problem :)

cheers,
  DaveK


Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-04 Thread Corinna Vinschen
On Jun  4 02:52, Dave Korn wrote:
 Dave Korn wrote:
  Dave Korn wrote:
The attached patch implements ilockexch and ilockcmpexch, using the 
  inline
  asm definition from __arch_compare_and_exchange_val_32_acq in
  glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline 
  rather
  than in its original preprocessor macro form.
 
It generates incorrect code.
  
This much looks like it's probably a compiler bug.  
 
   Let's see whether anyone else agrees:
 
 http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html

When you checked in this change, I'll create a 1.7.0-49 test release.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat


Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-04 Thread Dave Korn
Corinna Vinschen wrote:
 On Jun  4 02:52, Dave Korn wrote:
 Dave Korn wrote:
 Dave Korn wrote:
   The attached patch implements ilockexch and ilockcmpexch, using the 
 inline
 asm definition from __arch_compare_and_exchange_val_32_acq in
 glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline 
 rather
 than in its original preprocessor macro form.

   It generates incorrect code.
   This much looks like it's probably a compiler bug.  
   Let's see whether anyone else agrees:

 http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html
 
 When you checked in this change, I'll create a 1.7.0-49 test release.

  Am off to a meeting now; won't get a chance until late tonight.

cheers,
  DaveK


[PATCH?] Separate pthread patches, #2 take 2.

2009-06-03 Thread Dave Korn

  The attached patch implements ilockexch and ilockcmpexch, using the inline
asm definition from __arch_compare_and_exchange_val_32_acq in
glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
than in its original preprocessor macro form.

  It generates incorrect code.  The sequence discussed before,

126   do
127 node-next = head;
128   while (InterlockedCompareExchangePointer (head, node, node-next) !=
node-next);

is now compiled down to this loop:

L186:
.loc 3 127 0
movl__ZN13pthread_mutex7mutexesE+8, %edx # mutexes.head, D.28599
.loc 2 58 0
movl%edx, %eax   # D.28599, tmp68
/APP
 # 58 /gnu/winsup/src/winsup/cygwin/winbase.h 1
lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE+8   # this,
 # 0  2
/NO_APP
movl%eax, -12(%ebp)  # tmp68, ret
.loc 2 59 0
movl-12(%ebp), %eax  # ret, D.28596
.loc 3 126 0
cmpl%eax, %edx   # D.28596, D.28599
jne L186 #,
movl%edx, 36(%ebx)   # D.28599, variable.next


... which is in fact the equivalent of

126   do
127 ;
128   while (InterlockedCompareExchangePointer (head, node, node-next) !=
node-next);
(126) node-next = head;

  As it caches the values of head in %edx during the spin loop, and only
stores it to node-next after having overwritten *head with node, there is a
short window after the new node is linked to the front of the chain during
which its chain pointer is incorrect.

  Not OK for head?

cheers,
  DaveK


Index: winsup/cygwin/winbase.h
===
RCS file: /cvs/src/src/winsup/cygwin/winbase.h,v
retrieving revision 1.14
diff -p -u -r1.14 winbase.h
--- winsup/cygwin/winbase.h	12 Jul 2008 18:09:17 -	1.14
+++ winsup/cygwin/winbase.h	3 Jun 2009 22:54:38 -
@@ -34,27 +34,31 @@ ilockdecr (volatile long *m)
 	: =r (__res), =m (*m): m (*m): cc);
   return __res;
 }
-
-extern __inline__ long
-ilockexch (volatile long *t, long v)
-{
-  register int __res;
-  __asm__ __volatile__ (\n\
-1:	lock	cmpxchgl %3,(%1)\n\
-	jne 1b\n\
- 	: =a (__res), =q (t): 1 (t), q (v), 0 (*t): cc);
-  return __res;
-}
-
-extern __inline__ long
-ilockcmpexch (volatile long *t, long v, long c)
-{
-  register int __res;
-  __asm__ __volatile__ (\n\
-	lock cmpxchgl %3,(%1)\n\
-	: =a (__res), =q (t) : 1 (t), q (v), 0 (c): cc);
-  return __res;
-}
+
+extern __inline__ long
+ilockexch (volatile long *t, long v)
+{
+  return ({
+		__typeof (*t) ret;
+		__asm __volatile (1:	lock cmpxchgl %2, %1\n
+	jne 1b\n
+			: =a (ret), =m (*t)
+			: r (v), m (*t), 0 (*t));
+		ret;
+	});
+}
+
+extern __inline__ long
+ilockcmpexch (volatile long *t, long v, long c)
+{
+  return ({
+		__typeof (*t) ret;
+		__asm __volatile (lock cmpxchgl %2, %1
+			: =a (ret), =m (*t)
+			: r (v), m (*t), 0 (c));
+		ret;
+	});
+}
 
 #undef InterlockedIncrement
 #define InterlockedIncrement ilockincr


Re: [PATCH?] Separate pthread patches, #2 take 2.

2009-06-03 Thread Dave Korn
Dave Korn wrote:
   The attached patch implements ilockexch and ilockcmpexch, using the inline
 asm definition from __arch_compare_and_exchange_val_32_acq in
 glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
 than in its original preprocessor macro form.
 
   It generates incorrect code.

  This much looks like it's probably a compiler bug.  This version, compiled
with current GCC HEAD, generates the same results as in the take 3 version,
without needing the explicit memory clobber added:

L469:
.loc 2 127 0
movl__ZN13pthread_mutex7mutexesE+8, %eax # mutexes.head, D.30413
movl%eax, 36(%ebx)   # D.30413, variable.next
.loc 5 58 0
/APP
 # 58 /gnu/winsup/src/winsup/cygwin/winbase.h 1
lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE+8   # this,
 # 0  2
/NO_APP
movl%eax, -12(%ebp)  # tmp76, ret
.loc 5 59 0
movl-12(%ebp), %eax  # ret, D.30414
.loc 2 126 0
cmpl%eax, 36(%ebx)   # D.30414, variable.next
jne L469 #,

... right down to the unoptimised register motion.  This is what we would have
hoped to see: the memory clobber ought to be superfluous, and the
write-output operand in *t should have told GCC not to move the store to
node-next after the loop.

  I checked 4.3.3; it behaves the same as 4.3.2, i.e. it incorrectly lowers
the store without a memory clobber present.

cheers,
  DaveK