Re: [Valgrind-users] helgrind and atomic operations

2012-08-28 Thread Julian Seward
On Tuesday, August 28, 2012, Patrick J. LoPresti wrote:
 On Sun, Aug 26, 2012 at 2:06 AM, David Faure fa...@kde.org wrote:
  Marc Mutz said 
  The standard says it's racy, but the implementation of
  std::atomic::load(memory_order_acquire) won't look different. Simple
  reads and writes on x86 are already sequentially consistent.
 
 I do not think this is true.  Loads on x86 do have acquire semantics,
 and stores do have release semantics, but that is not the same as
 sequential consistency.  If loads and stores on x86 were already
 sequentially consistent, there would be no need for the mfence
 instruction...

On re-reading this, I am wondering now if Marc meant x86 TSO (total 
store order) when he said sequential consistency.  IIRC from Hennessy
and Patterson, no real processor provides sequential consistency because
to do so precludes the use of per-processor store queues, and without
those a high performance memory system is more or less impossible.
x86 does provide TSO, a guarantee that stores by one processor appear
to other processors in the same order.  But that's a weaker guarantee
than sequential consistency.


 Out of curiosity, is Helgrind expected to generate errors for
 correctly-implemented lock-free algorithms?  (That is, algorithms with
 correct memory barriers and no mutexes?)

It will generate false errors in cases where it can't see all of the
happens-before relationships between threads.  By default it ships with
intercepts that correctly detect those relationships for POSIX pthreads
(by intercepting pthread_mutex_lock, unlock, etc).  However, if you
create your own locking primitives or do synchronisation some other way,
there's a flexible set of client requests for marking up code, so it
can indeed be checked.  I did this just the other day for home-rolled
spinlocks.

J


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Fwd: Re: helgrind and atomic operations

2012-08-28 Thread David Faure

--  Forwarded Message  --

Subject: Re: [Valgrind-users] helgrind and atomic operations
Date: Monday 27 August 2012, 15:25:14
From: Marc Mutz marc.m...@kdab.com
To: David Faure fa...@kde.org
CC: valgrind-users@lists.sourceforge.net

[I can't post to valgrind-users, please fwd]

On Monday August 27 2012, David Faure wrote:
 On Sunday 26 August 2012 12:53:41 Marc Mutz wrote:
  On Sunday August 26 2012, David Faure wrote:
[...]
   static int onDemandNumber() {
  
   static QBasicAtomicInt sharedInt = { 0 };
   if (!sharedInt.loadAcquire()) {
  
   // some calculation goes here
   sharedInt.storeRelease(42);
  
   }
   return sharedInt.loadAcquire();
  
   }
 
  If sharedInt is a std::atomicint, then the above is not a C++11 data
  race,

 No, it's a bare int, just like Qt5's QBasicAtomicInt does by default on
 x86. You missed the initial email with the testcase, here it is
 (testcase_atomic_ops_helgrind.cpp).

The point I was trying to make was that unless Q*Atomic has the same semantics 
as std::atomic, we don't need to continue talk about this, so let's assume 
that it should have, and check what's missing, then. And, according to the 
threads here:
  http://www.decadent.org.uk/pipermail/cpp-threads/2008-December/
the implementation of std::atomic would (except for IA-64 load) look exactly 
the same as QAtomic, save that the compiler must magically know it's dealing 
with an atomic operation, and not reorder them. AFAIU, this is ensured by the 
use of asm() in the QAtomic code, or alternatively with a volatile cast.

I'm still not sure that Helgrind will be able to detect atomic operations from 
assembler instructions alone, because at least on x86, all loads and stores 
are already atomic (because of E in MESI), unless they're misaligned and/or 
cross cache-line boundaries.

What's worth, memory barriers on x86 are no-ops, unless you go for 
std::memory_order_seq_cst, which QAtomics don't implement.

So, IMO, QAtomic needs to mark up the code such that valgrind can see that 
*this* MOV comes from an atomic, and thus never races against another 
*atomic* MOV on the same memory location, but races against a MOV that's not 
emitted from std::atomic or QAtomic: the identical assembler code could mean 
a C++11 data race or not; the semantics have been lost at the compilation 
stage.

  so Helgrind shouldn't warn.

 Helgrind warns, hence my mail here.

  If 'some calculation goes here' doesn't write
  to memory global memory, you could even drop the memory ordering. Atomics
  don't participate in data races, but they might also not establish the
  happens-before that you need elsewhere.

 This is obviously a reduced testcase ;)
 The calculation, in the example I took, is to register the metatype, which
 is global stuff, but which is somewhat safe if done twice (it will return
 the same number). Inside QMetaType::registerNormalizedType there's a mutex.
 The whole point, though, as I see it, is to avoid locking in the very
 common case where the metatype has been registered already.

It's the job of QMetaType to ensure visibility. All data that could be thought 
of being associated with the ID is private to QMetaType and both in the 
current as well as the new one proposed in 
https://codereview.qt-project.org/#change,30559 contain additional fences to 
order writes to and reads from the custom metatype storage. So the memory in 
qt_metatype_id() are indeed unnecessary.

[...]
  If it writes to a shared memory location, the code contains a C++11 data
  race, and you need more complex synchronisation

 Yes, which is there, but that's not the point. From what you say, the code
 is fine, so helgrind shouldn't warn about sharedInt itself. However, Julian
 says the code is not fine, the compiler could generate code that doesn't
 work correctly. I don't know who to trust at this point, I'm caught in the
 middle

 :-).

 If this is safe, helgrind should be fixed. If this is not safe, Qt should
 be fixed. But it sounds to me like different people have different
 interpretations on what is safe, in such code :-)

There's only one interpretation: C++11. Under C++11, QAtomic has a data race, 
because it's not using the std atomic operations. OTOH, those very same std 
operations should produce identical assembler instruction sequences compared 
with QAtomic, so there's no actual data race. What's more, a C++11-data-racy 
access to a memory location is indistinguishable from an atomic, and 
therefore race-free, access. - need for markup

Damn x86 and it's sequential consistency :)

 I just realized one thing though: I was wrong when I said it is racy
 because the value could be 0 or 42, that is not a race. Writing the same
 code with a mutex around the int leads to exactly this behavior (0 or 42),
 which is fine. See testcase_int_mutex_helgrind.cpp. No helgrind warning,
 and I think everyone agrees that this code is ok. So if ints are read and
 written atomically, then the