Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Mon, 31 Jul 2017, Jeff Law wrote: > > Please consider that expand_mem_thread_fence is used to place fences around > > seq-cst atomic loads when the backend doesn't provide a direct > > pattern. > > With compiler barriers on both sides of the machine barrier, the generated > > sequence for a seq-cst atomic load will be 7 insns: > > > > asm volatile ("":::"memory"); > > machine_seq_cst_fence (); > > asm volatile ("":::"memory"); > > dst = mem[src]; > > asm volatile ("":::"memory"); > > machine_seq_cst_fence (); > > asm volatile ("":::"memory"); > > > > I can easily imagine people looking at RTL dumps with this overkill fencing > > being unhappy about this. > But the extra fences aren't actually going to impact anything except > perhaps an unmeasurable compile-time hit. ie, it may look bad, but I'd > have a hard time believing it matters in practice. I agree it doesn't matter that much from compile-time/memory standpoint. I meant it matters from the standpoint of a person working with the RTL dumps, i.e. having to work through all that, especially if they need to work with non-slim dumps. > > I'd be more happy with detecting empty expansion via get_last_insn (). > That seems like an unnecessary complication to me. I think it's quite simple by GCC standards: { rtx_insn last = get_last_insn (); emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model))); if (last == get_last_insn () && !is_mm_relaxed (model)) expand_asm_memory_barrier (); } > I'd prefer instead to just emit the necessary fencing in the generic code > and update the MD docs so that maintainers know they don't have to emit > the RTL fences themselves. I agree the docs need an update, but hopefully not in this way. The legacy __sync_synchronize barrier has always been required to be a compiler barrier when expanded to RTL, and it's quite natural to use the same RTL structure for new atomic fences as the old memory_barrier expansion. The only problem in current practice seen so far is with empty expansion for new patterns. Thanks. Alexander
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On 07/31/2017 11:02 AM, Alexander Monakov wrote: > On Mon, 31 Jul 2017, Jeff Law wrote: In the middle end patch, do we need a barrier before the fence as well? The post-fence barrier prevents reordering the fence with anything which follows the fence. But do we have to also prevent reordering the fence with prior instructions with any of the memory models? This isn't my area of expertise, so if it's dumb question, don't hesitate to let me know :-) >>> >>> That depends on how pessimistic we want to be with respect to backend >>> getting it wrong. My expectation here is that if a backend emits non-empty >>> RTL, the produced sequence for the fence itself acts as a compiler memory >>> barrier. >> Perhaps. But do we really want to rely on that? EMitting a scheduling >> barrier prior to these atomics is virtually free. > > Please consider that expand_mem_thread_fence is used to place fences around > seq-cst atomic loads when the backend doesn't provide a direct pattern. > With compiler barriers on both sides of the machine barrier, the generated > sequence for a seq-cst atomic load will be 7 insns: > > asm volatile ("":::"memory"); > machine_seq_cst_fence (); > asm volatile ("":::"memory"); > dst = mem[src]; > asm volatile ("":::"memory"); > machine_seq_cst_fence (); > asm volatile ("":::"memory"); > > I can easily imagine people looking at RTL dumps with this overkill fencing > being unhappy about this. But the extra fences aren't actually going to impact anything except perhaps an unmeasurable compile-time hit. ie, it may look bad, but I'd have a hard time believing it matters in practice. > > I'd be more happy with detecting empty expansion via get_last_insn (). That seems like an unnecessary complication to me. I'd prefer instead to just emit the necessary fencing in the generic code and update the MD docs so that maintainers know they don't have to emit the RTL fences themselves. Jeff
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Mon, 31 Jul 2017, Jeff Law wrote: > >> In the middle end patch, do we need a barrier before the fence as well? > >> The post-fence barrier prevents reordering the fence with anything which > >> follows the fence. But do we have to also prevent reordering the fence > >> with prior instructions with any of the memory models? This isn't my > >> area of expertise, so if it's dumb question, don't hesitate to let me > >> know :-) > > > > That depends on how pessimistic we want to be with respect to backend > > getting it wrong. My expectation here is that if a backend emits non-empty > > RTL, the produced sequence for the fence itself acts as a compiler memory > > barrier. > Perhaps. But do we really want to rely on that? EMitting a scheduling > barrier prior to these atomics is virtually free. Please consider that expand_mem_thread_fence is used to place fences around seq-cst atomic loads when the backend doesn't provide a direct pattern. With compiler barriers on both sides of the machine barrier, the generated sequence for a seq-cst atomic load will be 7 insns: asm volatile ("":::"memory"); machine_seq_cst_fence (); asm volatile ("":::"memory"); dst = mem[src]; asm volatile ("":::"memory"); machine_seq_cst_fence (); asm volatile ("":::"memory"); I can easily imagine people looking at RTL dumps with this overkill fencing being unhappy about this. I'd be more happy with detecting empty expansion via get_last_insn (). Thanks. Alexander
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On 07/26/2017 12:13 PM, Alexander Monakov wrote: > On Wed, 26 Jul 2017, Jeff Law wrote: >> I'm not sure what you mean by extraneous compiler barriers -- isn't the >> worst case scenario here that the target emits them as well? So there >> would be an extraneous one in that case, but that ought to be a "don't >> care". > > Yes, exactly this. > >> In the middle end patch, do we need a barrier before the fence as well? >> The post-fence barrier prevents reordering the fence with anything which >> follows the fence. But do we have to also prevent reordering the fence >> with prior instructions with any of the memory models? This isn't my >> area of expertise, so if it's dumb question, don't hesitate to let me >> know :-) > > That depends on how pessimistic we want to be with respect to backend > getting it wrong. My expectation here is that if a backend emits non-empty > RTL, the produced sequence for the fence itself acts as a compiler memory > barrier. Perhaps. But do we really want to rely on that? EMitting a scheduling barrier prior to these atomics is virtually free. Jeff
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On 07/26/2017 02:02 PM, Alexander Monakov wrote: > On Wed, 26 Jul 2017, Alexander Monakov wrote: > >> On Wed, 26 Jul 2017, Jeff Law wrote: >>> I'm not sure what you mean by extraneous compiler barriers -- isn't the >>> worst case scenario here that the target emits them as well? So there >>> would be an extraneous one in that case, but that ought to be a "don't >>> care". >> >> Yes, exactly this. > > I've just realized that we can detect if the backend produced empty RTL > sequence by looking at get_last_insn () before/after gen_mem_thread_fence. > This way we can emit a compiler barrier iff the backend didn't emit a > machine barrier. We could. But I suspect just emitting the barriers in the generic code is the way to go. If we have a redundant barrier, the compile time cost is minimal and I don't think the additional code to avoid creating the exxtra barrier is worth it. jeff
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Wed, 26 Jul 2017, Alexander Monakov wrote: > On Wed, 26 Jul 2017, Jeff Law wrote: > > I'm not sure what you mean by extraneous compiler barriers -- isn't the > > worst case scenario here that the target emits them as well? So there > > would be an extraneous one in that case, but that ought to be a "don't > > care". > > Yes, exactly this. I've just realized that we can detect if the backend produced empty RTL sequence by looking at get_last_insn () before/after gen_mem_thread_fence. This way we can emit a compiler barrier iff the backend didn't emit a machine barrier. Alexander
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Wed, 26 Jul 2017, Jeff Law wrote: > I'm not sure what you mean by extraneous compiler barriers -- isn't the > worst case scenario here that the target emits them as well? So there > would be an extraneous one in that case, but that ought to be a "don't > care". Yes, exactly this. > In the middle end patch, do we need a barrier before the fence as well? > The post-fence barrier prevents reordering the fence with anything which > follows the fence. But do we have to also prevent reordering the fence > with prior instructions with any of the memory models? This isn't my > area of expertise, so if it's dumb question, don't hesitate to let me > know :-) That depends on how pessimistic we want to be with respect to backend getting it wrong. My expectation here is that if a backend emits non-empty RTL, the produced sequence for the fence itself acts as a compiler memory barrier. Thanks! Alexander
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On 07/26/2017 11:19 AM, Alexander Monakov wrote: > On Wed, 26 Jul 2017, Jeff Law wrote: >> So I think this is up to the target maintainers. I have no concerns >> with enabling use of expand_asm_memory_barrier to be used outside of >> optabs. So if the s390/x86 maintainers want to go forward, the optabs >> changes are pre-approved. > > Please see the alternative middle-end patch: > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00549.html I almost made the comment that we should look if we could do this at a higher level. But then figured that the expander could be called from multiple locations, thus requiring that all those points be fixed or at least routed to a common function. But it looks like all the fence stuff is routed into expand_mem_thread_fence, so indeed that's a better place. [ And if there were only a good way to enforce that we can't directly call the backend expander from anywhere other than the bless location. ] > and the recently reported related bug: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316> > Since we got it wrong for both fences and loads/stores on two different > targets, > perhaps fixing it once and for all in the middle-end is more appropriate. I > hope extraneous compiler barriers can be tolerated. Agreed on middle-end being more appropriate. I'm not sure what you mean by extraneous compiler barriers -- isn't the worst case scenario here that the target emits them as well? So there would be an extraneous one in that case, but that ought to be a "don't care". In the middle end patch, do we need a barrier before the fence as well? The post-fence barrier prevents reordering the fence with anything which follows the fence. But do we have to also prevent reordering the fence with prior instructions with any of the memory models? This isn't my area of expertise, so if it's dumb question, don't hesitate to let me know :-) > > Yep - the same applies to atomic loads/stores too. They can be expanded > as volatile accesses, but we need compiler barrier(s) anyway (except for > those with relaxed memory model). Right. jeff
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Wed, 26 Jul 2017, Jeff Law wrote: > So I think this is up to the target maintainers. I have no concerns > with enabling use of expand_asm_memory_barrier to be used outside of > optabs. So if the s390/x86 maintainers want to go forward, the optabs > changes are pre-approved. Please see the alternative middle-end patch: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00549.html and the recently reported related bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81316 Since we got it wrong for both fences and loads/stores on two different targets, perhaps fixing it once and for all in the middle-end is more appropriate. I hope extraneous compiler barriers can be tolerated. > The reasoning seems sound to me -- we may not need fencing at the > hardware level because it gives a certain set of guarantees, but we may > still need them at the compiler level to prevent undesirable code > motions across the fence point in the compiler. Yep - the same applies to atomic loads/stores too. They can be expanded as volatile accesses, but we need compiler barrier(s) anyway (except for those with relaxed memory model). Thanks. Alexander
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On 05/26/2017 04:58 AM, Alexander Monakov wrote: > On Wed, 17 May 2017, Alexander Monakov wrote: > >> Ping. > > Ping^2? > >> (to be clear, patch 2/2 is my previous followup in this thread, I forgot to >> adjust the subject line; it should have said: >> "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). >> >> On Wed, 10 May 2017, Alexander Monakov wrote: >> >>> Hi, >>> >>> When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't >>> emit >>> any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). >>> This >>> is incorrect: although no machine barrier is needed, the compiler still must >>> emit a compiler barrier into the IR to prevent propagation and code motion >>> across the fence. The testcase added with the patch shows how it can lead >>> to a miscompilation. >>> >>> The proposed patch fixes it by handling non-seq-cst fences exactly like >>> __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory"). >>> >>> The s390 backend uses the a similar mem_thread_fence expansion, so the patch >>> fixes both backends in the same manner. >>> >>> Bootstrapped and regtested on x86_64; also checked that s390-linux cc1 >>> successfully builds after the change. OK for trunk? >>> >>> (the original source code in the PR was misusing atomic fences by doing >>> something like >>> >>> void f(int *p) >>> { >>> while (*p) >>> __atomic_thread_fence(__ATOMIC_ACQUIRE); >>> } >>> >>> but since *p is not atomic, a concurrent write to *p would cause a data >>> race and >>> thus invoke undefined behavior; also, if *p is false prior to entering the >>> loop, >>> execution does not encounter the fence; new test here has code usable >>> without UB) >>> >>> Alexander >>> >>> * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for >>> non-seq-cst fences. Adjust comment. >>> * config/s390/s390.md (mem_thread_fence): Likewise. >>> * optabs.c (expand_asm_memory_barrier): Export. >>> * optabs.h (expand_asm_memory_barrier): Declare. >>> testsuite/ >>> * gcc.target/i386/pr80640-1.c: New testcase. So I think this is up to the target maintainers. I have no concerns with enabling use of expand_asm_memory_barrier to be used outside of optabs. So if the s390/x86 maintainers want to go forward, the optabs changes are pre-approved. The reasoning seems sound to me -- we may not need fencing at the hardware level because it gives a certain set of guarantees, but we may still need them at the compiler level to prevent undesirable code motions across the fence point in the compiler. Jeff
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Thu, 8 Jun 2017, Alexander Monakov wrote: > Ping^3. Ping^4: https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00782.html This is a wrong-code issue with C11 atomics: even if no machine barrier is needed for a given fence type on this architecture, a compiler barrier must be present in RTL. Alternatively, it's possible to have a more complete and future-proof solution by explicitly emitting a compiler barrier from the middle-end, leaving it up to the backend to emit a machine barrier if needed. The following patch could achieve that (but at the cost of creating slightly redundant RTL on targets that always emit some kind of memory barrier). * optabs.c (expand_mem_thread_fence): Always emit a compiler barrier if using mem_thread_fence expansion. diff --git a/gcc/optabs.c b/gcc/optabs.c index 8fd5d91..92080c3 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6297,7 +6297,11 @@ void expand_mem_thread_fence (enum memmodel model) { if (targetm.have_mem_thread_fence ()) -emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model))); +{ + emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model))); + if (!is_mm_relaxed (model)) + expand_asm_memory_barrier (); +} else if (!is_mm_relaxed (model)) { if (targetm.have_memory_barrier ())
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Fri, 26 May 2017, Alexander Monakov wrote: > > Ping. > > Ping^2? Ping^3. > > (to be clear, patch 2/2 is my previous followup in this thread, I forgot to > > adjust the subject line; it should have said: > > "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). > > > > On Wed, 10 May 2017, Alexander Monakov wrote: > > > > > Hi, > > > > > > When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't > > > emit > > > any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). > > > This > > > is incorrect: although no machine barrier is needed, the compiler still > > > must > > > emit a compiler barrier into the IR to prevent propagation and code motion > > > across the fence. The testcase added with the patch shows how it can lead > > > to a miscompilation. > > > > > > The proposed patch fixes it by handling non-seq-cst fences exactly like > > > __atomic_signal_fence is expanded, by emitting asm > > > volatile("":::"memory"). > > > > > > The s390 backend uses the a similar mem_thread_fence expansion, so the > > > patch > > > fixes both backends in the same manner. > > > > > > Bootstrapped and regtested on x86_64; also checked that s390-linux cc1 > > > successfully builds after the change. OK for trunk? > > > > > > (the original source code in the PR was misusing atomic fences by doing > > > something like > > > > > > void f(int *p) > > > { > > > while (*p) > > > __atomic_thread_fence(__ATOMIC_ACQUIRE); > > > } > > > > > > but since *p is not atomic, a concurrent write to *p would cause a data > > > race and > > > thus invoke undefined behavior; also, if *p is false prior to entering > > > the loop, > > > execution does not encounter the fence; new test here has code usable > > > without UB) > > > > > > Alexander > > > > > > * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for > > > non-seq-cst fences. Adjust comment. > > > * config/s390/s390.md (mem_thread_fence): Likewise. > > > * optabs.c (expand_asm_memory_barrier): Export. > > > * optabs.h (expand_asm_memory_barrier): Declare. > > > testsuite/ > > > * gcc.target/i386/pr80640-1.c: New testcase. > > > --- > > > gcc/config/i386/sync.md | 7 ++- > > > gcc/config/s390/s390.md | 11 +-- > > > gcc/optabs.c | 2 +- > > > gcc/optabs.h | 3 +++ > > > gcc/testsuite/gcc.target/i386/pr80640-1.c | 12 > > > 5 files changed, 31 insertions(+), 4 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-1.c > > > > > > diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md > > > index 20d46fe..619d53b 100644 > > > --- a/gcc/config/i386/sync.md > > > +++ b/gcc/config/i386/sync.md > > > @@ -108,7 +108,7 @@ (define_expand "mem_thread_fence" > > >enum memmodel model = memmodel_from_int (INTVAL (operands[0])); > > > > > >/* Unless this is a SEQ_CST fence, the i386 memory model is strong > > > - enough not to require barriers of any kind. */ > > > + enough not to require a processor barrier of any kind. */ > > >if (is_mm_seq_cst (model)) > > > { > > >rtx (*mfence_insn)(rtx); > > > @@ -124,6 +124,11 @@ (define_expand "mem_thread_fence" > > > > > >emit_insn (mfence_insn (mem)); > > > } > > > + else if (!is_mm_relaxed (model)) > > > +{ > > > + /* However, a compiler barrier is still required. */ > > > + expand_asm_memory_barrier (); > > > +} > > >DONE; > > > }) > > > > > > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > > > index c9fd19a..65e54c4 100644 > > > --- a/gcc/config/s390/s390.md > > > +++ b/gcc/config/s390/s390.md > > > @@ -10109,14 +10109,21 @@ (define_expand "mem_thread_fence" > > >[(match_operand:SI 0 "const_int_operand")] ;; model > > >"" > > > { > > > + enum memmodel model = memmodel_from_int (INTVAL (operands[0])); > > > + > > >/* Unless this is a SEQ_CST fence, the s390 memory model is strong > > > - enough not to require barriers of any kind. */ > > > - if (is_mm_seq_cst (memmodel_from_int (INTVAL (operands[0] > > > + enough not to require a processor barrier of any kind. */ > > > + if (is_mm_seq_cst (model)) > > > { > > >rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); > > >MEM_VOLATILE_P (mem) = 1; > > >emit_insn (gen_mem_thread_fence_1 (mem)); > > > } > > > + else if (!is_mm_relaxed (model)) > > > +{ > > > + /* However, a compiler barrier is still required. */ > > > + expand_asm_memory_barrier (); > > > +} > > >DONE; > > > }) > > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c > > > index 48e37f8..1f1fbc3 100644 > > > --- a/gcc/optabs.c > > > +++ b/gcc/optabs.c > > > @@ -6269,7 +6269,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, > > > rtx *ptarget_oval, > > > > > > /*
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
On Wed, 17 May 2017, Alexander Monakov wrote: > Ping. Ping^2? > (to be clear, patch 2/2 is my previous followup in this thread, I forgot to > adjust the subject line; it should have said: > "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). > > On Wed, 10 May 2017, Alexander Monakov wrote: > > > Hi, > > > > When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't > > emit > > any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). > > This > > is incorrect: although no machine barrier is needed, the compiler still must > > emit a compiler barrier into the IR to prevent propagation and code motion > > across the fence. The testcase added with the patch shows how it can lead > > to a miscompilation. > > > > The proposed patch fixes it by handling non-seq-cst fences exactly like > > __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory"). > > > > The s390 backend uses the a similar mem_thread_fence expansion, so the patch > > fixes both backends in the same manner. > > > > Bootstrapped and regtested on x86_64; also checked that s390-linux cc1 > > successfully builds after the change. OK for trunk? > > > > (the original source code in the PR was misusing atomic fences by doing > > something like > > > > void f(int *p) > > { > > while (*p) > > __atomic_thread_fence(__ATOMIC_ACQUIRE); > > } > > > > but since *p is not atomic, a concurrent write to *p would cause a data > > race and > > thus invoke undefined behavior; also, if *p is false prior to entering the > > loop, > > execution does not encounter the fence; new test here has code usable > > without UB) > > > > Alexander > > > > * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for > > non-seq-cst fences. Adjust comment. > > * config/s390/s390.md (mem_thread_fence): Likewise. > > * optabs.c (expand_asm_memory_barrier): Export. > > * optabs.h (expand_asm_memory_barrier): Declare. > > testsuite/ > > * gcc.target/i386/pr80640-1.c: New testcase. > > --- > > gcc/config/i386/sync.md | 7 ++- > > gcc/config/s390/s390.md | 11 +-- > > gcc/optabs.c | 2 +- > > gcc/optabs.h | 3 +++ > > gcc/testsuite/gcc.target/i386/pr80640-1.c | 12 > > 5 files changed, 31 insertions(+), 4 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-1.c > > > > diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md > > index 20d46fe..619d53b 100644 > > --- a/gcc/config/i386/sync.md > > +++ b/gcc/config/i386/sync.md > > @@ -108,7 +108,7 @@ (define_expand "mem_thread_fence" > >enum memmodel model = memmodel_from_int (INTVAL (operands[0])); > > > >/* Unless this is a SEQ_CST fence, the i386 memory model is strong > > - enough not to require barriers of any kind. */ > > + enough not to require a processor barrier of any kind. */ > >if (is_mm_seq_cst (model)) > > { > >rtx (*mfence_insn)(rtx); > > @@ -124,6 +124,11 @@ (define_expand "mem_thread_fence" > > > >emit_insn (mfence_insn (mem)); > > } > > + else if (!is_mm_relaxed (model)) > > +{ > > + /* However, a compiler barrier is still required. */ > > + expand_asm_memory_barrier (); > > +} > >DONE; > > }) > > > > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > > index c9fd19a..65e54c4 100644 > > --- a/gcc/config/s390/s390.md > > +++ b/gcc/config/s390/s390.md > > @@ -10109,14 +10109,21 @@ (define_expand "mem_thread_fence" > >[(match_operand:SI 0 "const_int_operand")] ;; model > >"" > > { > > + enum memmodel model = memmodel_from_int (INTVAL (operands[0])); > > + > >/* Unless this is a SEQ_CST fence, the s390 memory model is strong > > - enough not to require barriers of any kind. */ > > - if (is_mm_seq_cst (memmodel_from_int (INTVAL (operands[0] > > + enough not to require a processor barrier of any kind. */ > > + if (is_mm_seq_cst (model)) > > { > >rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); > >MEM_VOLATILE_P (mem) = 1; > >emit_insn (gen_mem_thread_fence_1 (mem)); > > } > > + else if (!is_mm_relaxed (model)) > > +{ > > + /* However, a compiler barrier is still required. */ > > + expand_asm_memory_barrier (); > > +} > >DONE; > > }) > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c > > index 48e37f8..1f1fbc3 100644 > > --- a/gcc/optabs.c > > +++ b/gcc/optabs.c > > @@ -6269,7 +6269,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, > > rtx *ptarget_oval, > > > > /* Generate asm volatile("" : : : "memory") as the memory barrier. */ > > > > -static void > > +void > > expand_asm_memory_barrier (void) > > { > >rtx asm_op, clob; > > diff --git a/gcc/optabs.h b/gcc/optabs.h > > index b2dd31a..aca6755 100644 > > ---
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
Ping. (to be clear, patch 2/2 is my previous followup in this thread, I forgot to adjust the subject line; it should have said: "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). On Wed, 10 May 2017, Alexander Monakov wrote: > Hi, > > When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't emit > any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). This > is incorrect: although no machine barrier is needed, the compiler still must > emit a compiler barrier into the IR to prevent propagation and code motion > across the fence. The testcase added with the patch shows how it can lead > to a miscompilation. > > The proposed patch fixes it by handling non-seq-cst fences exactly like > __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory"). > > The s390 backend uses the a similar mem_thread_fence expansion, so the patch > fixes both backends in the same manner. > > Bootstrapped and regtested on x86_64; also checked that s390-linux cc1 > successfully builds after the change. OK for trunk? > > (the original source code in the PR was misusing atomic fences by doing > something like > > void f(int *p) > { > while (*p) > __atomic_thread_fence(__ATOMIC_ACQUIRE); > } > > but since *p is not atomic, a concurrent write to *p would cause a data race > and > thus invoke undefined behavior; also, if *p is false prior to entering the > loop, > execution does not encounter the fence; new test here has code usable without > UB) > > Alexander > > * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for > non-seq-cst fences. Adjust comment. > * config/s390/s390.md (mem_thread_fence): Likewise. > * optabs.c (expand_asm_memory_barrier): Export. > * optabs.h (expand_asm_memory_barrier): Declare. > testsuite/ > * gcc.target/i386/pr80640-1.c: New testcase. > --- > gcc/config/i386/sync.md | 7 ++- > gcc/config/s390/s390.md | 11 +-- > gcc/optabs.c | 2 +- > gcc/optabs.h | 3 +++ > gcc/testsuite/gcc.target/i386/pr80640-1.c | 12 > 5 files changed, 31 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-1.c > > diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md > index 20d46fe..619d53b 100644 > --- a/gcc/config/i386/sync.md > +++ b/gcc/config/i386/sync.md > @@ -108,7 +108,7 @@ (define_expand "mem_thread_fence" >enum memmodel model = memmodel_from_int (INTVAL (operands[0])); > >/* Unless this is a SEQ_CST fence, the i386 memory model is strong > - enough not to require barriers of any kind. */ > + enough not to require a processor barrier of any kind. */ >if (is_mm_seq_cst (model)) > { >rtx (*mfence_insn)(rtx); > @@ -124,6 +124,11 @@ (define_expand "mem_thread_fence" > >emit_insn (mfence_insn (mem)); > } > + else if (!is_mm_relaxed (model)) > +{ > + /* However, a compiler barrier is still required. */ > + expand_asm_memory_barrier (); > +} >DONE; > }) > > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > index c9fd19a..65e54c4 100644 > --- a/gcc/config/s390/s390.md > +++ b/gcc/config/s390/s390.md > @@ -10109,14 +10109,21 @@ (define_expand "mem_thread_fence" >[(match_operand:SI 0 "const_int_operand")] ;; model >"" > { > + enum memmodel model = memmodel_from_int (INTVAL (operands[0])); > + >/* Unless this is a SEQ_CST fence, the s390 memory model is strong > - enough not to require barriers of any kind. */ > - if (is_mm_seq_cst (memmodel_from_int (INTVAL (operands[0] > + enough not to require a processor barrier of any kind. */ > + if (is_mm_seq_cst (model)) > { >rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); >MEM_VOLATILE_P (mem) = 1; >emit_insn (gen_mem_thread_fence_1 (mem)); > } > + else if (!is_mm_relaxed (model)) > +{ > + /* However, a compiler barrier is still required. */ > + expand_asm_memory_barrier (); > +} >DONE; > }) > > diff --git a/gcc/optabs.c b/gcc/optabs.c > index 48e37f8..1f1fbc3 100644 > --- a/gcc/optabs.c > +++ b/gcc/optabs.c > @@ -6269,7 +6269,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx > *ptarget_oval, > > /* Generate asm volatile("" : : : "memory") as the memory barrier. */ > > -static void > +void > expand_asm_memory_barrier (void) > { >rtx asm_op, clob; > diff --git a/gcc/optabs.h b/gcc/optabs.h > index b2dd31a..aca6755 100644 > --- a/gcc/optabs.h > +++ b/gcc/optabs.h > @@ -322,6 +322,9 @@ extern bool expand_atomic_compare_and_swap (rtx *, rtx *, > rtx, rtx, rtx, bool, > extern void expand_mem_thread_fence (enum memmodel); > extern void expand_mem_signal_fence (enum memmodel); > > +/* Generate a compile-time memory barrier. */ > +extern void
Re: [PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
While fixing the fences issue of PR80640 I've noticed that a similar oversight is present in expansion of atomic loads on x86: they become volatile loads, but that is insufficient, a compiler memory barrier is still needed. Volatility prevents tearing the load (preserves non-divisibility of atomic access), but does not prevent propagation and code motion of non-volatile accesses across the load. The testcase below demonstrates that even SEQ_CST loads are mishandled. It's less clear how to fix this one. AFAICS there are two possible approaches: either emit explicit compiler barriers on either side of the load (a barrier before the load is needed for SEQ_CST loads), or change how atomic loads are expressed in RTL: atomic stores already use ATOMIC_STA UNSPECs, so they don't suffer from this issue. The asymmetry seems odd to me. The former approach is easier and exposes non-seq-cst loads upwards for optimization, so that's what the proposed patch implements. The s390 backend suffers from the same issue, except there the atomic stores are affected too. Alexander * config/i386/sync.md (atomic_load): Emit a compiler barrier before (only for SEQ_CST order) and after the load. testsuite/ * gcc.target/i386/pr80640-2.c: New testcase. --- gcc/config/i386/sync.md | 7 +++ gcc/testsuite/gcc.target/i386/pr80640-2.c | 11 +++ 2 files changed, 18 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-2.c diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 619d53b..35a7b38 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -155,6 +155,11 @@ (define_expand "atomic_load" UNSPEC_LDA))] "" { + enum memmodel model = memmodel_from_int (INTVAL (operands[2])); + + if (is_mm_seq_cst (model)) +expand_asm_memory_barrier (); + /* For DImode on 32-bit, we can use the FPU to perform the load. */ if (mode == DImode && !TARGET_64BIT) emit_insn (gen_atomic_loaddi_fpu @@ -173,6 +178,8 @@ (define_expand "atomic_load" if (dst != operands[0]) emit_move_insn (operands[0], dst); } + if (!is_mm_relaxed (model)) +expand_asm_memory_barrier (); DONE; }) diff --git a/gcc/testsuite/gcc.target/i386/pr80640-2.c b/gcc/testsuite/gcc.target/i386/pr80640-2.c new file mode 100644 index 000..f0a050c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80640-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-ter" } */ +/* { dg-final { scan-assembler-not "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" } } */ + +int f(int *p, volatile _Bool *p1, _Atomic _Bool *p2) +{ + int v = *p; + *p1 = !v; + while (__atomic_load_n (p2, __ATOMIC_SEQ_CST)); + return v - *p; +} -- 1.8.3.1
[PATCH 1/2] x86,s390: add compiler memory barriers when expanding atomic_thread_fence (PR 80640)
Hi, When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't emit any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). This is incorrect: although no machine barrier is needed, the compiler still must emit a compiler barrier into the IR to prevent propagation and code motion across the fence. The testcase added with the patch shows how it can lead to a miscompilation. The proposed patch fixes it by handling non-seq-cst fences exactly like __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory"). The s390 backend uses the a similar mem_thread_fence expansion, so the patch fixes both backends in the same manner. Bootstrapped and regtested on x86_64; also checked that s390-linux cc1 successfully builds after the change. OK for trunk? (the original source code in the PR was misusing atomic fences by doing something like void f(int *p) { while (*p) __atomic_thread_fence(__ATOMIC_ACQUIRE); } but since *p is not atomic, a concurrent write to *p would cause a data race and thus invoke undefined behavior; also, if *p is false prior to entering the loop, execution does not encounter the fence; new test here has code usable without UB) Alexander * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for non-seq-cst fences. Adjust comment. * config/s390/s390.md (mem_thread_fence): Likewise. * optabs.c (expand_asm_memory_barrier): Export. * optabs.h (expand_asm_memory_barrier): Declare. testsuite/ * gcc.target/i386/pr80640-1.c: New testcase. --- gcc/config/i386/sync.md | 7 ++- gcc/config/s390/s390.md | 11 +-- gcc/optabs.c | 2 +- gcc/optabs.h | 3 +++ gcc/testsuite/gcc.target/i386/pr80640-1.c | 12 5 files changed, 31 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr80640-1.c diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 20d46fe..619d53b 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -108,7 +108,7 @@ (define_expand "mem_thread_fence" enum memmodel model = memmodel_from_int (INTVAL (operands[0])); /* Unless this is a SEQ_CST fence, the i386 memory model is strong - enough not to require barriers of any kind. */ + enough not to require a processor barrier of any kind. */ if (is_mm_seq_cst (model)) { rtx (*mfence_insn)(rtx); @@ -124,6 +124,11 @@ (define_expand "mem_thread_fence" emit_insn (mfence_insn (mem)); } + else if (!is_mm_relaxed (model)) +{ + /* However, a compiler barrier is still required. */ + expand_asm_memory_barrier (); +} DONE; }) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index c9fd19a..65e54c4 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -10109,14 +10109,21 @@ (define_expand "mem_thread_fence" [(match_operand:SI 0 "const_int_operand")] ;; model "" { + enum memmodel model = memmodel_from_int (INTVAL (operands[0])); + /* Unless this is a SEQ_CST fence, the s390 memory model is strong - enough not to require barriers of any kind. */ - if (is_mm_seq_cst (memmodel_from_int (INTVAL (operands[0] + enough not to require a processor barrier of any kind. */ + if (is_mm_seq_cst (model)) { rtx mem = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); MEM_VOLATILE_P (mem) = 1; emit_insn (gen_mem_thread_fence_1 (mem)); } + else if (!is_mm_relaxed (model)) +{ + /* However, a compiler barrier is still required. */ + expand_asm_memory_barrier (); +} DONE; }) diff --git a/gcc/optabs.c b/gcc/optabs.c index 48e37f8..1f1fbc3 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6269,7 +6269,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval, /* Generate asm volatile("" : : : "memory") as the memory barrier. */ -static void +void expand_asm_memory_barrier (void) { rtx asm_op, clob; diff --git a/gcc/optabs.h b/gcc/optabs.h index b2dd31a..aca6755 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -322,6 +322,9 @@ extern bool expand_atomic_compare_and_swap (rtx *, rtx *, rtx, rtx, rtx, bool, extern void expand_mem_thread_fence (enum memmodel); extern void expand_mem_signal_fence (enum memmodel); +/* Generate a compile-time memory barrier. */ +extern void expand_asm_memory_barrier (void); + rtx expand_atomic_load (rtx, rtx, enum memmodel); rtx expand_atomic_store (rtx, rtx, enum memmodel, bool); rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel, diff --git a/gcc/testsuite/gcc.target/i386/pr80640-1.c b/gcc/testsuite/gcc.target/i386/pr80640-1.c new file mode 100644 index 000..f1d1e55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr80640-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-ter" }