Re: thoughts on libatomic
On Mon, 2012-04-23 at 12:29 -0700, Richard Henderson wrote: On 04/16/12 11:15, Torvald Riegel wrote: - for acq/rel memorders, we don't need seq_cst fences whenever the atomicity is implemented with a lock (hostconfig.c:pre_barrier/post_barrier) Err.. where? This also seems to conflict with... I thought on the fallback paths that use locks. Locks typically use acq/rel semantics, so for the particular location, we don't need pre/post barrier code to be run. Or maybe it didn't happen, and the one below is correct. Sorry for the potential mix-up. I can look at this again after PTO. - protect_start|end: - BUG: those are called without additional post|pre_barrier calls in cas_n.c, exch_n.c, fop_n.c, load_n.c, store_n.c The lock membar guarantees are too weak if memorder for the atomic access is seq_cst, and if you use more than one lock. (assumes that lock implementation uses typical acq/rel barriers). ... this. I don't see what multiple locks has to do with seq vs acqrel. If we don't run seq_cst membars when falling back to locks, then seq_cst operations on two locations ordered by acq/rel locks will not be forced to be in one total order (which seq_cst requires). They will be ordered per lock, but not globally. - config/posix/lock.c: - I'm not sure that WATCH_SIZE should almost certainly be the same as the cacheline size. Why? Which assumptions about the program? See above. Plus I assumed that if we have to take a lock, that's got to dominate over any cache ping-pong that would be played. Perhaps the cache ping-pong dominates (and this assumes that the lock is contended). But perhaps the cache ping-pong wouldn't be worse than the contention potentially introduced from a too-coarse-grained lock? - Do locks need to care about priority inversion? At the moment I'm leaving this with the pthread implementation. pthreads offers PI-safe locks too. But I also don't know whether we need this.
Re: thoughts on libatomic
On 04/16/12 11:15, Torvald Riegel wrote: Richard, Andrew, I had a look at libatomic yesterday, focusing primarily on synchronization issues in it. Here are some comments. And I think there is a bug in it too. Notes are in no particular order. Let me know what you think. - seq_cst fences are always used whenever stuff is not relaxed. I suppose this is a potential future TODO? (default host_config.c) Fixed. I think this only really affected powerpc. - for acq/rel memorders, we don't need seq_cst fences whenever the atomicity is implemented with a lock (hostconfig.c:pre_barrier/post_barrier) Err.. where? This also seems to conflict with... - protect_start|end: - BUG: those are called without additional post|pre_barrier calls in cas_n.c, exch_n.c, fop_n.c, load_n.c, store_n.c The lock membar guarantees are too weak if memorder for the atomic access is seq_cst, and if you use more than one lock. (assumes that lock implementation uses typical acq/rel barriers). ... this. I don't see what multiple locks has to do with seq vs acqrel. - only ptr, no size used, and so atomic types must always be accessed by same-sized ops. This does make sense of course, but should we document this explicitly somewhere? Possibly. It's also a side-effect of using a WATCH_SIZE that is at least as large as the largest atomic type supported. - load_n.c: - I'm concerned about the CAS on read-only mprotected pages? Why again do we think this is safe? Does the standard explicitly allow this? Or should we just use a lock in this case? Andrew, you had a bit of back-and-forth with someone about this. Can you dig that up? - cas_n.c: - I suppose this is a strong cmpxchg? Of course. Weak is restricted to the builtin interface; the library does not support it. - config/posix/lock.c: - I'm not sure that WATCH_SIZE should almost certainly be the same as the cacheline size. Why? Which assumptions about the program? See above. Plus I assumed that if we have to take a lock, that's got to dominate over any cache ping-pong that would be played. - page_size optzn: AFAICT, according to the standard (29.4.3), only lock-free atomics are supposed to also be address-free. So libatomic wouldn't need to be constrained by the page size, or we can use a different hash function too to try to avoid false sharing / conflicts. Hmm. Ok, I'll think about this. - Do locks need to care about priority inversion? At the moment I'm leaving this with the pthread implementation. - other notes: - might be better to prefer locks over a kernel cmpxchg perhaps? any archs/platforms that have a lock Almost certainly not. When kernel cmpxchg is implemented, that's also how locks get implemented in libc. No point using a lock to avoid the kernel in those cases. I've updated the tree at git://repo.or.cz/gcc/rth.git rth/libatomic r~
Re: thoughts on libatomic
On 04/23/2012 03:29 PM, Richard Henderson wrote: - load_n.c: - I'm concerned about the CAS on read-only mprotected pages? Why again do we think this is safe? Does the standard explicitly allow this? Or should we just use a lock in this case? Andrew, you had a bit of back-and-forth with someone about this. Can you dig that up? yes, this keeps coming up again and again I think you should take it up with Lawrence Crowl... He was the one, (along with some input from Jeffrey Yasskin) that concluded that it was OK to use it, even if it was just sort of shoehorned in so that we could get 16 byte lock free on the most common architecture... it came down to something like the architecture manual entry for cmpxchg16 states that a store cycle may be added by the hardware under the covers, and as such, it is a part of the basic machine description, and therefore we might as well use the instruction even though we may be adding the store cycle ourselves sometimes I haven't found the actual communication for reference yet, I'll look again tomorrow. Andrew
Re: thoughts on libatomic
On 4/23/12, Andrew MacLeod amacl...@redhat.com wrote: On 04/23/2012 03:29 PM, Richard Henderson wrote: - load_n.c: - I'm concerned about the CAS on read-only mprotected pages? Why again do we think this is safe? Does the standard explicitly allow this? Or should we just use a lock in this case? Andrew, you had a bit of back-and-forth with someone about this. Can you dig that up? yes, this keeps coming up again and again I think you should take it up with Lawrence Crowl... He was the one, (along with some input from Jeffrey Yasskin) that concluded that it was OK to use it, even if it was just sort of shoehorned in so that we could get 16 byte lock free on the most common architecture... The earlier discussion was about volatile. Given the way the C++ standard works, I think the volatile issue really only affects programmers that want to access device registers with atomic operations. Machines are lots more complicated than they were when I accessed device registers, so I don't have a good feel for what the current expectations are. The standard is somewhat vague on volatile, so I think the CAS-based read is likely to be permitted. Even so, I have brougt up the issue with the C++ standard folks, and maybe we can get some more feedback here. it came down to something like the architecture manual entry for cmpxchg16 states that a store cycle may be added by the hardware under the covers, and as such, it is a part of the basic machine description, and therefore we might as well use the instruction even though we may be adding the store cycle ourselves sometimes I haven't found the actual communication for reference yet, I'll look again tomorrow. The standard says nothing about mprotect, so the implementation gets to define all the semantics. However, if the hardware gives a fault for the CAS but the programmer specified a simple read, the programmer may be a bit miffed. Even so, I don't think we have any other reasonable choice in the short term. We will probably need to wait for hardware to catch up. -- Lawrence Crowl