[Bug sanitizer/48076] Unsafe double checked locking in __emutls_get_address

2012-11-28 Thread torvald at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48076



--- Comment #7 from torvald at gcc dot gnu.org 2012-11-28 14:29:47 UTC ---

(In reply to comment #6)

 There seems to be a similar bug in code generated for function static

 variables.

 The fast-path load is a plain load rather than atomic acquire load.



Haven't looked at the details, but this indeed looks similar.


[Bug sanitizer/48076] Unsafe double checked locking in __emutls_get_address

2012-11-27 Thread rth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48076



--- Comment #2 from Richard Henderson rth at gcc dot gnu.org 2012-11-27 
18:01:17 UTC ---

Are you sure this isn't a false-positive?



The way I read this code, it is certainly possible for the optimizer

(or the processor) to prefetch emutls_key before the load of offset:



  __gthread_key_t prefetch = emutls_key;

  pointer offset = obj-loc.offset;



  if (__builtin_expect (offset == 0, 0))

{

  ...

}



  struct __emutls_array *arr = __gthread_getspecific (prefetch);



But the compiler should see the memory barriers within the if path

and insert



  if (__builtin_expect (offset == 0, 0))

{

  ...

  __gthread_mutex_unlock (emutls_mutex);

  prefetch = emutls_key;

}



and the processor had better cancel any speculative prefetch when

it sees the explicit barriers.



I'm also assuming here that Android is using the same gthr-posix.h

that is used by desktop glibc linux, and so there isn't a mistake

in the gthread macros causing a lack of barrier...


[Bug sanitizer/48076] Unsafe double checked locking in __emutls_get_address

2012-11-27 Thread rth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48076



Richard Henderson rth at gcc dot gnu.org changed:



   What|Removed |Added



 Status|UNCONFIRMED |ASSIGNED

   Last reconfirmed||2012-11-27

 AssignedTo|unassigned at gcc dot   |rth at gcc dot gnu.org

   |gnu.org |

 Ever Confirmed|0   |1



--- Comment #3 from Richard Henderson rth at gcc dot gnu.org 2012-11-27 
18:18:23 UTC ---

Nevermind.  Talking this over with Torvald I see the error.

You are correct.


[Bug sanitizer/48076] Unsafe double checked locking in __emutls_get_address

2012-11-27 Thread rth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48076



--- Comment #4 from Richard Henderson rth at gcc dot gnu.org 2012-11-27 
22:05:34 UTC ---

Created attachment 28798

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28798

proposed patch



This should fix the race that I eventually saw (read), as well as one that

Torvald pointed out that I didn't see (write; acq_rel to unrelated address?).


[Bug sanitizer/48076] Unsafe double checked locking in __emutls_get_address

2012-11-27 Thread dvyukov at google dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48076



Dmitry Vyukov dvyukov at google dot com changed:



   What|Removed |Added



 CC||dvyukov at google dot com



--- Comment #5 from Dmitry Vyukov dvyukov at google dot com 2012-11-28 
03:51:50 UTC ---

Looks good to me


[Bug sanitizer/48076] Unsafe double checked locking in __emutls_get_address

2012-11-27 Thread dvyukov at google dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48076



--- Comment #6 from Dmitry Vyukov dvyukov at google dot com 2012-11-28 
03:56:50 UTC ---

There seems to be a similar bug in code generated for function static

variables.

The fast-path load is a plain load rather than atomic acquire load.



On Sat, Nov 24, 2012 at 3:06 PM, Dmitry Vyukov dvyu...@google.com wrote:

 On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek ja...@redhat.com wrote:

 On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:

 That's what llvm does as well. But it inserts a fast path before

 __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the

 same?



 Yes, except it isn't __atomic_load_*, but plain memory read.

   _3 = MEM[(char *)_ZGVZ3foovE1a];

   if (_3 == 0)

 goto bb 3;

   else

 goto bb 8;



   bb 8:

   fast path, whatever;



   bb 3:

   _5 = __cxa_guard_acquire (_ZGVZ3foovE1a);

   ...



 So, right now tsan would just instrument it as __tsan_read1 from

 _ZGVZ3foovE1a rather than any atomic load.





 Looks like a bug. That needs to be load-acquire with proper compiler

 and hardware memory ordering.