[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 Andrew Macleod amacleod at redhat dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #9 from Andrew Macleod amacleod at redhat dot com --- GCC emits the fence on stores, and thus loads don't need a fence.
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 --- Comment #5 from mikulas at artax dot karlin.mff.cuni.cz --- So, please pinpoint specific parahraph(s) in the standard that specify that this behavior is acceptable.
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 --- Comment #7 from mikulas at artax dot karlin.mff.cuni.cz --- #include stdatomic.h atomic_int a = ATOMIC_VAR_INIT(0); atomic_int b = ATOMIC_VAR_INIT(0); atomic_int p = ATOMIC_VAR_INIT(0); int thread_1(void) { atomic_store_explicit(b, 1, memory_order_relaxed); atomic_load_explicit(p, memory_order_seq_cst); return atomic_load_explicit(a, memory_order_relaxed); } int thread_2(void) { atomic_store_explicit(a, 1, memory_order_relaxed); atomic_load_explicit(p, memory_order_seq_cst); return atomic_load_explicit(b, memory_order_relaxed); } See for example this. Suppose that thread_1 and thread_2 are executed concurrently. If memory_order_seq_cst were a proper full memory barrier, it would be impossible that both functions return 0. Because you omit the barrier on read of variable p, it is possible that both functions return 0. thread_1 is compiled into movl$1, b(%rip) movlp(%rip), %eax movla(%rip), %eax ret thread_2 is compiled into movl$1, a(%rip) movlp(%rip), %eax movlb(%rip), %eax ret ... and the processor is free to move the writes past reads, resulting in both functions returning zero. Does the standard allow this behavior? I don't really know. I don't understand the standard. Please tell me - how do you decide, by interpreting claims in the section 7.17.3 of the C11 standard, whether the above outcome is allowed or not?
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 --- Comment #4 from Andrew Macleod amacleod at redhat dot com --- I'm not sure where the problem is. We interacted quite a bit as the model was being developed. As I recall, it started with the standard, but they strengthened some of the problem spots for a complete testable model. To the best of my knowledge, this is the most efficient implementation that we *know* to be safe, and that is why we implement it.
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 --- Comment #6 from Andrew Macleod amacleod at redhat dot com --- The standard doesn't define what machines should generate what code. It defines terms for observing effects that need to be adhered to. Their machine model was created over a few years during the early stages of the memory model to help observe and test those effects. To the best of our knowledge we think the results from this model are an efficient and correct implementation within the standards definition. If you can provide a test case which demonstrates that this is not a correct implementation based on observable effects (ie the load is observed out of order somewhere), then we'd look at fixing it in GCC. If you want to discuss whether parts are or aren't compliant without a test case that fails, then you should contact the authors with your questions since they can give you a far better answer than I ever could.
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 torvald at gcc dot gnu.org changed: What|Removed |Added CC||torvald at gcc dot gnu.org --- Comment #8 from torvald at gcc dot gnu.org --- (In reply to mikulas from comment #7) #include stdatomic.h atomic_int a = ATOMIC_VAR_INIT(0); atomic_int b = ATOMIC_VAR_INIT(0); atomic_int p = ATOMIC_VAR_INIT(0); int thread_1(void) { atomic_store_explicit(b, 1, memory_order_relaxed); atomic_load_explicit(p, memory_order_seq_cst); return atomic_load_explicit(a, memory_order_relaxed); } int thread_2(void) { atomic_store_explicit(a, 1, memory_order_relaxed); atomic_load_explicit(p, memory_order_seq_cst); return atomic_load_explicit(b, memory_order_relaxed); } See for example this. Suppose that thread_1 and thread_2 are executed concurrently. If memory_order_seq_cst were a proper full memory barrier, it would be impossible that both functions return 0. memory_order_seq_cst is a memory order in the Standard's terminology. Fences are something else (ie, atomic_thread_fence()) , and parametrized by a memory order. A memory_order_seq_cst *memory access* does not have the same effects as a memory_order_seq_cst fence. See C++14 29.3p4-7; those paragraphs talk about memory_order_seq_cst fences specifically, not about memory_order_seq_cst operations in general. If you want to make this example of Dekker synchronization correct, you need to use fences instead of the accesses to p; alternatively, you need to use seq-cst accesses for all the stores and loads to a and b, in which case there will be HW fences added via the stores (as Andrew already pointed out).
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 Andrew Macleod amacleod at redhat dot com changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #2 from Andrew Macleod amacleod at redhat dot com --- I do not believe the lock is needed. See https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html GCC will emit the fence on all stores which then synchronize with loads. If we did emit a fence on the loads, then we wouldn't need the fence on stores. It is likely there will be more loads than stores, so the current approach is usually more efficient.
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 --- Comment #3 from mikulas at artax dot karlin.mff.cuni.cz --- The problem here is that we don't really know what does the standard specify. People often suggest the Batty's paper Mathematizing C++ Concurrency ( http://www.cl.cam.ac.uk/~pes20/cpp/popl085ap-sewell.pdf ) as the explanation, but the paper really describes a different memory model than the C++ standard (citing section 4: Unfortunately N3092 allows the following nonsequentially consistent execution of the SB example with SC atomics (initialisation writes, such as (a) and (b), are non-atomic so that they need not be compiled with memory fences): - We devised a stronger restriction on the values that may be read by SC atomics, stated in ยง2.7, that does provide sequential consistency here. - so it doesn't describe the standard, but something stronger that authors have devised) You can look at this example https://gcc.gnu.org/ml/gcc/2014-02/msg00053.html . The assert can fail - so it implies that __ATOMIC_SEQ_CST is not a full barrier, it is somehow weaker. But how much weaker is it? Who knows? I look at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html and I would really like to know why is it so? Where does the standard specify this mapping? I couldn't really find an answer for that.
[Bug target/59767] __atomic_load_n with __ATOMIC_SEQ_CST should result in a memory barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767 James Almer jamrial at gmail dot com changed: What|Removed |Added CC||jamrial at gmail dot com --- Comment #1 from James Almer jamrial at gmail dot com --- This still happens with gcc 4.9 and gcc 5. int foo(int *bar) { return *bar; } and int foo(int *bar) { return __atomic_load_n(bar, __ATOMIC_SEQ_CST); } Generate the exact same assembly. The latter should add an mfence before the mov instruction.