[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 Andrew Pinski changed: What|Removed |Added CC||davmac at davmac dot org --- Comment #18 from Andrew Pinski --- *** Bug 81516 has been marked as a duplicate of this bug. ***
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #17 from Uroš Bizjak --- Fixed.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #16 from uros at gcc dot gnu.org --- Author: uros Date: Sun May 14 12:49:55 2017 New Revision: 248032 URL: https://gcc.gnu.org/viewcvs?rev=248032=gcc=rev Log: Backport from mainline 2017-05-11 Uros BizjakPR target/80706 * config/i386/sync.md (UNSPEC_LDX_ATOMIC): New unspec. (UNSPEC_STX_ATOMIC): Ditto. (loaddi_via_sse): New insn. (storedi_via_sse): Ditto. (atomic_loaddi_fpu): Emit loaddi_via_sse and storedi_via_sse. Update corresponding peephole2 patterns. (atomic_storedi_fpu): Ditto. testsuite/ChangeLog: Backport from mainline 2017-05-11 Uros Bizjak Jakub Jelinek PR target/80706 * gcc.target/i386/pr80706.c: New test. 2017-05-11 Uros Bizjak * gcc.target/i386/pr22152.c: Fix undefined testcase. Remove unnecessary loop. Run on 32-bit targets only. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr80706.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/sync.md branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr22152.c
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #15 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #13) > Created attachment 41340 [details] > gcc8-pr80706.patch > > I came up with this (in addition to #c7). This removes one of the two > useless stores in the original pr71245-1.c testcase and restores the above > pr71245-1.c modification to what it used to look before (well, even better, > as it reserves smaller stack). The committed patch fixes unwanted matching by introducing specialized SSE load/store patterns. I don't think it is worth complicating sync.md any further, the stack slot is shared, and the patched compiler generates exactly the same assembly as before.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #14 from uros at gcc dot gnu.org --- Author: uros Date: Thu May 11 18:12:25 2017 New Revision: 247921 URL: https://gcc.gnu.org/viewcvs?rev=247921=gcc=rev Log: PR target/80706 * config/i386/sync.md (UNSPEC_LDX_ATOMIC): New unspec. (UNSPEC_STX_ATOMIC): Ditto. (loaddi_via_sse): New insn. (storedi_via_sse): Ditto. (atomic_loaddi_fpu): Emit loaddi_via_sse and storedi_via_sse. Update corresponding peephole2 patterns. (atomic_storedi_fpu): Ditto. testsuite/ChangeLog: PR target/80706 * gcc.target/i386/pr80706.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr80706.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sync.md trunk/gcc/testsuite/ChangeLog
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #13 from Jakub Jelinek --- Created attachment 41340 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41340=edit gcc8-pr80706.patch I came up with this (in addition to #c7). This removes one of the two useless stores in the original pr71245-1.c testcase and restores the above pr71245-1.c modification to what it used to look before (well, even better, as it reserves smaller stack).
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #12 from Uroš Bizjak --- Created attachment 41339 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41339=edit Patch to prevent unwanted peephole2 matching Instead of using generic SSE move patterns, attached patch implements and uses loaddi_via_sse and storedi_via_sse specialized patterns. This effectively prevents unwanted peephole2 matching.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #11 from Jakub Jelinek --- (In reply to Jakub Jelinek from comment #10) > Perhaps if we had such a pattern that we'd split into a normal DFmode load > (perhaps with unspec before reload to guarantee it is atomic load), we > wouldn't need the temporary at all? --- gcc/config/i386/predicates.md.jj2017-01-01 12:45:42.0 +0100 +++ gcc/config/i386/predicates.md 2017-05-11 11:42:17.649136648 +0200 @@ -1657,3 +1657,14 @@ (define_predicate "register_or_constm1_o (ior (match_operand 0 "register_operand") (and (match_code "const_int") (match_test "op == constm1_rtx" + +;; Return true if OP is a memory_operand, including volatile MEM. +(define_predicate "volatile_memory_operand" + (match_code "mem,subreg") +{ + int save_volatile_ok = volatile_ok; + volatile_ok = 1; + bool ret = memory_operand (op, mode); + volatile_ok = save_volatile_ok; + return ret; +}) --- gcc/config/i386/sync.md.jj 2017-05-11 10:16:03.0 +0200 +++ gcc/config/i386/sync.md 2017-05-11 11:42:45.67179 +0200 @@ -210,6 +210,17 @@ (define_insn_and_split "atomic_loaddi_fp DONE; }) +(define_insn_and_split "*atomic_loaddf_fpu" + [(set (match_operand:DF 0 "nonimmediate_operand" "=x,f") + (subreg:DF (unspec:DI [(match_operand:DI 1 "volatile_memory_operand" + "m,m")] + UNSPEC_LDA) 0))] + "!TARGET_64BIT && (TARGET_80387 || TARGET_SSE)" + "#" + "&& 1" + [(set (match_dup 0) (match_dup 1))] + "operands[1] = gen_lowpart (DFmode, operands[1]);") + (define_peephole2 [(set (match_operand:DF 0 "fp_register_operand") (unspec:DF [(match_operand:DI 1 "memory_operand")] does that, unfortunately combine still fails, because the insn it wants to match afterwards is: (set (reg:DF 91) (plus:DF (reg:DF 92) (const_double:DF 1.0e+0 [0x0.8p+1]))) But the above patch at least helps a little bit on following testcase: typedef union { unsigned long long ll; double d; } u_t; u_t d = { .d = 5.0 }; void foo_d (double x) { u_t tmp; tmp.ll = __atomic_load_n (, __ATOMIC_SEQ_CST); tmp.d += x; __atomic_store_n (, tmp.ll, __ATOMIC_SEQ_CST); } Before the #c7 patch, we get: fldld faddl 24(%esp) fstpl d lock; orl $0, (%esp) with just the #c7 patch we get: fldld fstl(%esp) faddl 24(%esp) fstld fstpl (%esp) lock; orl $0, (%esp) so 2 useless stores. With #c7 and this patch we get: fldld faddl 24(%esp) fstld fstpl (%esp) lock; orl $0, (%esp) i.e. one useless store. So, either we need combine or some other pre-reload pass to figure out we have all uses of the atomic_loaddi_fpu pattern as (subreg:DF (reg:DI ...)) and optimize that into the atomic_loaddf_fpu pattern with uses changed into just the DFmode pseudo. Allowing =f in atomic_loaddi_fpu won't work, as DImode is not VALID_FP_MODE_P.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #10 from Jakub Jelinek --- (In reply to Uroš Bizjak from comment #9) > (In reply to Jakub Jelinek from comment #8) > > The #c5 patch obviously doesn't help here, because the testcase triggers the > > last of these 4 peephole2s. But #c7 works. > > Thanks! It looks like we'll have to live with extra stores then... Can't we improve it in the combiner? For PR71245 testcase obviously, we have: (insn 5 2 6 2 (parallel [ (set (reg:DI 89 [ _4 ]) (unspec:DI [ (mem/v:DI (symbol_ref:SI ("d") [flags 0x2] ) [-1 S8 A64]) ] UNSPEC_LDA)) (clobber (mem/c:DI (plus:SI (reg/f:SI 20 frame) (const_int -8 [0xfff8])) [0 S8 A64])) (clobber (scratch:DF)) ]) "/usr/include/c++/6.3.1/atomic":235 4970 {atomic_loaddi_fpu} (nil)) ... (insn 8 7 9 2 (set (reg:DF 91) (plus:DF (subreg:DF (reg:DI 89 [ _4 ]) 0) (reg:DF 92))) "pr71245.C":5 805 {*fop_df_comm} (expr_list:REG_DEAD (reg:DF 92) (expr_list:REG_DEAD (reg:DI 89 [ _4 ]) (nil and apparently the combiner attempts to match: (set (reg:DF 92) (subreg:DF (unspec:DI [ (mem/v:DI (symbol_ref:SI ("d") [flags 0x2] ) [-1 S8 A64]) ] UNSPEC_LDA) 0)) Perhaps if we had such a pattern that we'd split into a normal DFmode load (perhaps with unspec before reload to guarantee it is atomic load), we wouldn't need the temporary at all?
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #9 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #8) > The #c5 patch obviously doesn't help here, because the testcase triggers the > last of these 4 peephole2s. But #c7 works. Thanks! It looks like we'll have to live with extra stores then...
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 Richard Biener changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* Priority|P3 |P2
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #8 from Jakub Jelinek --- The #c5 patch obviously doesn't help here, because the testcase triggers the last of these 4 peephole2s. But #c7 works.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #7 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #5) --cut here-- diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 20d46fe..895a1ea 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -222,7 +222,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))" - [(set (match_dup 3) (match_dup 5))] + [(set (match_dup 3) (match_dup 5)) + (set (match_dup 4) (match_dup 3))] "operands[5] = gen_lowpart (DFmode, operands[1]);") (define_peephole2 @@ -235,7 +236,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))" - [(set (match_dup 3) (match_dup 5))] + [(set (match_dup 3) (match_dup 5)) + (set (match_dup 4) (match_dup 3))] "operands[5] = gen_lowpart (DFmode, operands[1]);") (define_expand "atomic_store" @@ -338,7 +340,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (3, operands[2]) && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))" - [(set (match_dup 5) (match_dup 1))] + [(set (match_dup 5) (match_dup 1)) + (set (match_dup 0) (match_dup 1))] "operands[5] = gen_lowpart (DFmode, operands[4]);") (define_peephole2 @@ -351,7 +354,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (3, operands[2]) && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))" - [(set (match_dup 5) (match_dup 1))] + [(set (match_dup 5) (match_dup 1)) + (set (match_dup 0) (match_dup 1))] "operands[5] = gen_lowpart (DFmode, operands[4]);") ;; ??? You'd think that we'd be able to perform this via FLOAT + FIX_TRUNC --cut here-- > Jakub, does the above patch fix the failure? Bah, cut-n-pasto... the above should be correct patch.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #6 from Jakub Jelinek --- Simplified testcase: /* PR target/80706 */ /* { dg-do run { target sse2_runtime } } */ /* { dg-options "-O2 -msse2" } */ union U { double value; struct S { int lsw; int msw; } parts; }; __attribute__((noinline, noclone)) double foo (void) { __asm volatile ("" : : : "memory"); return 2.0; } __attribute__((noinline, noclone)) double bar (void) { double s = foo (); union U z; z.value = s; z.parts.lsw = 0; return z.value * z.value + s * s; } int main () { if (bar () != 8.0) __builtin_abort (); return 0; }
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #5 from Uroš Bizjak --- --cut here-- diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 20d46fe..d509be5 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -222,7 +222,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))" - [(set (match_dup 3) (match_dup 5))] + [(set (match_dup 3) (match_dup 5)) + (set (match_dup 4) (match_dup 3))] "operands[5] = gen_lowpart (DFmode, operands[1]);") (define_peephole2 @@ -235,7 +236,8 @@ "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0]) && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))" - [(set (match_dup 3) (match_dup 5))] + [(set (match_dup 3) (match_dup 5)) + (set (match_dup 4) (match_dup 3))] "operands[5] = gen_lowpart (DFmode, operands[1]);") (define_expand "atomic_store" --cut here-- Jakub, does the above patch fix the failure?
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 --- Comment #4 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #3) > So peephole2 does here: > - fstpl 24(%esp) > - movq24(%esp), %xmm0 > - movq%xmm0, 32(%esp) > + fstpl 32(%esp) > movl$0, 32(%esp) > That is a nice simplification, but has one extra requirement not checked > (and hard to check) in the peephole2 patterns - that the memory slot stored > in the first store is a scratch memory not used afterwards (or overwritten > first, i.e. dead). While we have peep2_reg_dead_p predicates, we don't have > peep2_mem_dead_p and implementing that would be hard, only DSE has > infrastructure to do that, but dse2 is run before peephole2 pass. > All we could do is simplify the mem[sp+24]=st; xmm0=[sp+24]; [sp+32]=xmm0; > into mem[sp+24]=st; mem[sp+32]=st; and let the regstack pass figure out > something with it - fstl 24(%esp); fstpl 32(%esp) ?). DSE isn't run > afterwards, so it would be nice to do that earlier though. Let's keep the dangling store to a temporary here. We already loaded the value from the memory, so one extra store won't hurt that much...
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 Jakub Jelinek changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- So peephole2 does here: - fstpl 24(%esp) - movq24(%esp), %xmm0 - movq%xmm0, 32(%esp) + fstpl 32(%esp) movl$0, 32(%esp) That is a nice simplification, but has one extra requirement not checked (and hard to check) in the peephole2 patterns - that the memory slot stored in the first store is a scratch memory not used afterwards (or overwritten first, i.e. dead). While we have peep2_reg_dead_p predicates, we don't have peep2_mem_dead_p and implementing that would be hard, only DSE has infrastructure to do that, but dse2 is run before peephole2 pass. All we could do is simplify the mem[sp+24]=st; xmm0=[sp+24]; [sp+32]=xmm0; into mem[sp+24]=st; mem[sp+32]=st; and let the regstack pass figure out something with it - fstl 24(%esp); fstpl 32(%esp) ?). DSE isn't run afterwards, so it would be nice to do that earlier though.
[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-05-11 CC||jakub at gcc dot gnu.org Target Milestone|--- |7.2 Summary|peephole2 uses |[7/8 Regression] peephole2 |uninitialized stack |uses uninitialized stack |variables on i686 |variables on i686 Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek --- Started with r236863.