Re: thoughts on libatomic

2012-04-24 Thread Torvald Riegel
On Mon, 2012-04-23 at 12:29 -0700, Richard Henderson wrote:
 On 04/16/12 11:15, Torvald Riegel wrote:
  - for acq/rel memorders, we don't need seq_cst fences whenever the
atomicity is implemented with a lock
(hostconfig.c:pre_barrier/post_barrier)
 
 Err.. where?  This also seems to conflict with...

I thought on the fallback paths that use locks.  Locks typically use
acq/rel semantics, so for the particular location, we don't need
pre/post barrier code to be run.

Or maybe it didn't happen, and the one below is correct.  Sorry for the
potential mix-up.  I can look at this again after PTO.

 
  - protect_start|end:
- BUG: those are called without additional post|pre_barrier calls in
  cas_n.c, exch_n.c, fop_n.c, load_n.c, store_n.c
  The lock membar guarantees are too weak if memorder for the atomic
  access is seq_cst, and if you use more than one lock.
  (assumes that lock implementation uses typical acq/rel barriers).
 
 ... this.  I don't see what multiple locks has to do with seq vs acqrel.

If we don't run seq_cst membars when falling back to locks, then seq_cst
operations on two locations ordered by acq/rel locks will not be forced
to be in one total order (which seq_cst requires).  They will be ordered
per lock, but not globally.

  - config/posix/lock.c:
- I'm not sure that WATCH_SIZE should almost certainly be the same as
  the cacheline size.  Why?  Which assumptions about the program?
 
 See above.  Plus I assumed that if we have to take a lock, that's got to
 dominate over any cache ping-pong that would be played.

Perhaps the cache ping-pong dominates (and this assumes that the lock is
contended).  But perhaps the cache ping-pong wouldn't be worse than the
contention potentially introduced from a too-coarse-grained lock?

- Do locks need to care about priority inversion?
 
 At the moment I'm leaving this with the pthread implementation.

pthreads offers PI-safe locks too.  But I also don't know whether we
need this. 



Re: thoughts on libatomic

2012-04-23 Thread Richard Henderson
On 04/16/12 11:15, Torvald Riegel wrote:
 Richard, Andrew,
 
 I had a look at libatomic yesterday, focusing primarily on
 synchronization issues in it.  Here are some comments.  And I think
 there is a bug in it too. Notes are in no particular order.  Let me know
 what you think.
 
 - seq_cst fences are always used whenever stuff is not relaxed.  I
   suppose this is a potential future TODO? (default host_config.c)

Fixed.  I think this only really affected powerpc.

 - for acq/rel memorders, we don't need seq_cst fences whenever the
   atomicity is implemented with a lock
   (hostconfig.c:pre_barrier/post_barrier)

Err.. where?  This also seems to conflict with...

 - protect_start|end:
   - BUG: those are called without additional post|pre_barrier calls in
 cas_n.c, exch_n.c, fop_n.c, load_n.c, store_n.c
 The lock membar guarantees are too weak if memorder for the atomic
 access is seq_cst, and if you use more than one lock.
 (assumes that lock implementation uses typical acq/rel barriers).

... this.  I don't see what multiple locks has to do with seq vs acqrel.

   - only ptr, no size used, and so atomic types must always be accessed
 by  same-sized ops.  This does make sense of course, but should we
 document this explicitly somewhere?

Possibly.  It's also a side-effect of using a WATCH_SIZE that is at least
as large as the largest atomic type supported.

 - load_n.c:
   - I'm concerned about the CAS on read-only mprotected pages?
 Why again do we think this is safe?  Does the standard explicitly
allow this?  Or should we just use a lock in this case?

Andrew, you had a bit of back-and-forth with someone about this.
Can you dig that up?

 - cas_n.c:
   - I suppose this is a strong cmpxchg?

Of course.  Weak is restricted to the builtin interface; the library does
not support it.

 - config/posix/lock.c:
   - I'm not sure that WATCH_SIZE should almost certainly be the same as
 the cacheline size.  Why?  Which assumptions about the program?

See above.  Plus I assumed that if we have to take a lock, that's got to
dominate over any cache ping-pong that would be played.

   - page_size optzn: AFAICT, according to the standard (29.4.3), only
 lock-free atomics are supposed to also be address-free.  So
 libatomic wouldn't need to be constrained by the page size, or we
 can use a different hash function too to try to avoid false
 sharing / conflicts.

Hmm.  Ok, I'll think about this.

   - Do locks need to care about priority inversion?

At the moment I'm leaving this with the pthread implementation.

 - other notes:
   - might be better to prefer locks over a kernel cmpxchg perhaps? any
 archs/platforms that have a lock

Almost certainly not.  When kernel cmpxchg is implemented, that's also
how locks get implemented in libc.  No point using a lock to avoid the
kernel in those cases.

I've updated the tree at

  git://repo.or.cz/gcc/rth.git rth/libatomic



r~


Re: thoughts on libatomic

2012-04-23 Thread Andrew MacLeod

On 04/23/2012 03:29 PM, Richard Henderson wrote:



- load_n.c:
   - I'm concerned about the CAS on read-only mprotected pages?
 Why again do we think this is safe?  Does the standard explicitly
allow this?  Or should we just use a lock in this case?

Andrew, you had a bit of back-and-forth with someone about this.
Can you dig that up?


yes, this keeps coming up again and again   I think you should take 
it up with Lawrence Crowl... He was the one, (along with some input from 
Jeffrey Yasskin)  that concluded that it was OK to use it, even if it 
was just sort of shoehorned in so that we could get 16 byte lock free on 
the most common architecture...


it came down to something like the architecture manual entry for 
cmpxchg16 states that a store cycle may be added by the hardware under 
the covers, and as such, it is a part of the basic machine description, 
and therefore we might as well use the instruction even though we may be 
adding the store cycle ourselves sometimes


I haven't found the actual communication for reference yet, I'll look 
again tomorrow.


Andrew





Re: thoughts on libatomic

2012-04-23 Thread Lawrence Crowl
On 4/23/12, Andrew MacLeod amacl...@redhat.com wrote:
 On 04/23/2012 03:29 PM, Richard Henderson wrote:
   - load_n.c:
  - I'm concerned about the CAS on read-only mprotected pages?
Why again do we think this is safe?  Does the standard
explicitly allow this?  Or should we just use a lock in
this case?
  Andrew, you had a bit of back-and-forth with someone about this.
  Can you dig that up?

 yes, this keeps coming up again and again  I think you should
 take it up with Lawrence Crowl... He was the one, (along with
 some input from Jeffrey Yasskin) that concluded that it was OK
 to use it, even if it was just sort of shoehorned in so that we
 could get 16 byte lock free on the most common architecture...

The earlier discussion was about volatile.  Given the way the C++
standard works, I think the volatile issue really only affects
programmers that want to access device registers with atomic
operations.  Machines are lots more complicated than they were when
I accessed device registers, so I don't have a good feel for what
the current expectations are.  The standard is somewhat vague on
volatile, so I think the CAS-based read is likely to be permitted.
Even so, I have brougt up the issue with the C++ standard folks,
and maybe we can get some more feedback here.

 it came down to something like the architecture manual entry
 for cmpxchg16 states that a store cycle may be added by the
 hardware under the covers, and as such, it is a part of the
 basic machine description, and therefore we might as well use
 the instruction even though we may be adding the store cycle
 ourselves sometimes

 I haven't found the actual communication for reference yet,
 I'll look again tomorrow.

The standard says nothing about mprotect, so the implementation
gets to define all the semantics.  However, if the hardware gives
a fault for the CAS but the programmer specified a simple read,
the programmer may be a bit miffed.  Even so, I don't think we have
any other reasonable choice in the short term.  We will probably
need to wait for hardware to catch up.

-- 
Lawrence Crowl