[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-07-23 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

Andrew Pinski  changed:

   What|Removed |Added

 CC||davmac at davmac dot org

--- Comment #18 from Andrew Pinski  ---
*** Bug 81516 has been marked as a duplicate of this bug. ***

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-14 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from Uroš Bizjak  ---
Fixed.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-14 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #16 from uros at gcc dot gnu.org ---
Author: uros
Date: Sun May 14 12:49:55 2017
New Revision: 248032

URL: https://gcc.gnu.org/viewcvs?rev=248032=gcc=rev
Log:
Backport from mainline
2017-05-11  Uros Bizjak  

PR target/80706
* config/i386/sync.md (UNSPEC_LDX_ATOMIC): New unspec.
(UNSPEC_STX_ATOMIC): Ditto.
(loaddi_via_sse): New insn.
(storedi_via_sse): Ditto.
(atomic_loaddi_fpu): Emit loaddi_via_sse and storedi_via_sse.
Update corresponding peephole2 patterns.
(atomic_storedi_fpu): Ditto.

testsuite/ChangeLog:

Backport from mainline
2017-05-11  Uros Bizjak  
Jakub Jelinek  

PR target/80706
* gcc.target/i386/pr80706.c: New test.

2017-05-11  Uros Bizjak  

* gcc.target/i386/pr22152.c: Fix undefined testcase.
Remove unnecessary loop.  Run on 32-bit targets only.


Added:
branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr80706.c
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/i386/sync.md
branches/gcc-7-branch/gcc/testsuite/ChangeLog
branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr22152.c

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #15 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #13)
> Created attachment 41340 [details]
> gcc8-pr80706.patch
> 
> I came up with this (in addition to #c7).  This removes one of the two
> useless stores in the original pr71245-1.c testcase and restores the above
> pr71245-1.c modification to what it used to look before (well, even better,
> as it reserves smaller stack).

The committed patch fixes unwanted matching by introducing specialized SSE
load/store patterns. I don't think it is worth complicating sync.md any
further, the stack slot is shared, and the patched compiler generates exactly
the same assembly as before.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #14 from uros at gcc dot gnu.org ---
Author: uros
Date: Thu May 11 18:12:25 2017
New Revision: 247921

URL: https://gcc.gnu.org/viewcvs?rev=247921=gcc=rev
Log:
PR target/80706
* config/i386/sync.md (UNSPEC_LDX_ATOMIC): New unspec.
(UNSPEC_STX_ATOMIC): Ditto.
(loaddi_via_sse): New insn.
(storedi_via_sse): Ditto.
(atomic_loaddi_fpu): Emit loaddi_via_sse and storedi_via_sse.
Update corresponding peephole2 patterns.
(atomic_storedi_fpu): Ditto.

testsuite/ChangeLog:

PR target/80706
* gcc.target/i386/pr80706.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr80706.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sync.md
trunk/gcc/testsuite/ChangeLog

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #13 from Jakub Jelinek  ---
Created attachment 41340
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41340=edit
gcc8-pr80706.patch

I came up with this (in addition to #c7).  This removes one of the two useless
stores in the original pr71245-1.c testcase and restores the above pr71245-1.c
modification to what it used to look before (well, even better, as it reserves
smaller stack).

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

Uroš Bizjak  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #12 from Uroš Bizjak  ---
Created attachment 41339
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41339=edit
Patch to prevent unwanted peephole2 matching

Instead of using generic SSE move patterns, attached patch implements and uses
loaddi_via_sse and storedi_via_sse specialized patterns.

This effectively prevents unwanted peephole2 matching.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #11 from Jakub Jelinek  ---
(In reply to Jakub Jelinek from comment #10)
> Perhaps if we had such a pattern that we'd split into a normal DFmode load
> (perhaps with unspec before reload to guarantee it is atomic load), we
> wouldn't need the temporary at all?

--- gcc/config/i386/predicates.md.jj2017-01-01 12:45:42.0 +0100
+++ gcc/config/i386/predicates.md   2017-05-11 11:42:17.649136648 +0200
@@ -1657,3 +1657,14 @@ (define_predicate "register_or_constm1_o
   (ior (match_operand 0 "register_operand")
(and (match_code "const_int")
(match_test "op == constm1_rtx"
+
+;; Return true if OP is a memory_operand, including volatile MEM.
+(define_predicate "volatile_memory_operand"
+  (match_code "mem,subreg")
+{
+  int save_volatile_ok = volatile_ok;
+  volatile_ok = 1;
+  bool ret = memory_operand (op, mode);
+  volatile_ok = save_volatile_ok;
+  return ret;
+})
--- gcc/config/i386/sync.md.jj  2017-05-11 10:16:03.0 +0200
+++ gcc/config/i386/sync.md 2017-05-11 11:42:45.67179 +0200
@@ -210,6 +210,17 @@ (define_insn_and_split "atomic_loaddi_fp
   DONE;
 })

+(define_insn_and_split "*atomic_loaddf_fpu"
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=x,f")
+   (subreg:DF (unspec:DI [(match_operand:DI 1 "volatile_memory_operand"
+  "m,m")]
+ UNSPEC_LDA) 0))]
+  "!TARGET_64BIT && (TARGET_80387 || TARGET_SSE)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (match_dup 1))]
+  "operands[1] = gen_lowpart (DFmode, operands[1]);")
+
 (define_peephole2
   [(set (match_operand:DF 0 "fp_register_operand")
(unspec:DF [(match_operand:DI 1 "memory_operand")]

does that, unfortunately combine still fails, because the insn it wants to
match afterwards is:
(set (reg:DF 91)
(plus:DF (reg:DF 92)
(const_double:DF 1.0e+0 [0x0.8p+1])))
But the above patch at least helps a little bit on following testcase:
typedef union
{
  unsigned long long ll;
  double d;
} u_t;

u_t d = { .d = 5.0 };

void foo_d (double x)
{
  u_t tmp;

  tmp.ll = __atomic_load_n (, __ATOMIC_SEQ_CST);
  tmp.d += x;
  __atomic_store_n (, tmp.ll, __ATOMIC_SEQ_CST);
}
Before the #c7 patch, we get:
fldld
faddl   24(%esp)
fstpl   d
lock; orl   $0, (%esp)
with just the #c7 patch we get:
fldld
fstl(%esp)
faddl   24(%esp)
fstld
fstpl   (%esp)
lock; orl   $0, (%esp)
so 2 useless stores.  With #c7 and this patch we get:
fldld
faddl   24(%esp)
fstld
fstpl   (%esp)
lock; orl   $0, (%esp)
i.e. one useless store.  So, either we need combine or some other pre-reload
pass to figure out we have all uses of the atomic_loaddi_fpu pattern as
(subreg:DF (reg:DI ...)) and optimize that into the atomic_loaddf_fpu pattern
with uses changed into just the DFmode pseudo.  Allowing =f in
atomic_loaddi_fpu won't work, as DImode is not VALID_FP_MODE_P.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #10 from Jakub Jelinek  ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Jakub Jelinek from comment #8)
> > The #c5 patch obviously doesn't help here, because the testcase triggers the
> > last of these 4 peephole2s.  But #c7 works.
> 
> Thanks! It looks like we'll have to live with extra stores then...

Can't we improve it in the combiner?
For PR71245 testcase obviously, we have:
(insn 5 2 6 2 (parallel [
(set (reg:DI 89 [ _4 ])
(unspec:DI [
(mem/v:DI (symbol_ref:SI ("d") [flags 0x2]  ) [-1  S8 A64])
] UNSPEC_LDA))
(clobber (mem/c:DI (plus:SI (reg/f:SI 20 frame)
(const_int -8 [0xfff8])) [0  S8 A64]))
(clobber (scratch:DF))
]) "/usr/include/c++/6.3.1/atomic":235 4970 {atomic_loaddi_fpu}
 (nil))
...
(insn 8 7 9 2 (set (reg:DF 91)
(plus:DF (subreg:DF (reg:DI 89 [ _4 ]) 0)
(reg:DF 92))) "pr71245.C":5 805 {*fop_df_comm}
 (expr_list:REG_DEAD (reg:DF 92)
(expr_list:REG_DEAD (reg:DI 89 [ _4 ])
(nil
and apparently the combiner attempts to match:
(set (reg:DF 92)
(subreg:DF (unspec:DI [
(mem/v:DI (symbol_ref:SI ("d") [flags 0x2]  ) [-1  S8 A64])
] UNSPEC_LDA) 0))
Perhaps if we had such a pattern that we'd split into a normal DFmode load
(perhaps with unspec before reload to guarantee it is atomic load), we wouldn't
need the temporary at all?

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #9 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #8)
> The #c5 patch obviously doesn't help here, because the testcase triggers the
> last of these 4 peephole2s.  But #c7 works.

Thanks! It looks like we'll have to live with extra stores then...

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*
   Priority|P3  |P2

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #8 from Jakub Jelinek  ---
The #c5 patch obviously doesn't help here, because the testcase triggers the
last of these 4 peephole2s.  But #c7 works.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #7 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #5)

--cut here--
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 20d46fe..895a1ea 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -222,7 +222,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
&& rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
-  [(set (match_dup 3) (match_dup 5))]
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 3))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")

 (define_peephole2
@@ -235,7 +236,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
&& rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
-  [(set (match_dup 3) (match_dup 5))]
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 3))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")

 (define_expand "atomic_store"
@@ -338,7 +340,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (3, operands[2])
&& rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
-  [(set (match_dup 5) (match_dup 1))]
+  [(set (match_dup 5) (match_dup 1))
+   (set (match_dup 0) (match_dup 1))]
   "operands[5] = gen_lowpart (DFmode, operands[4]);")

 (define_peephole2
@@ -351,7 +354,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (3, operands[2])
&& rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
-  [(set (match_dup 5) (match_dup 1))]
+  [(set (match_dup 5) (match_dup 1))
+   (set (match_dup 0) (match_dup 1))]
   "operands[5] = gen_lowpart (DFmode, operands[4]);")

 ;; ??? You'd think that we'd be able to perform this via FLOAT + FIX_TRUNC
--cut here--

> Jakub, does the above patch fix the failure?

Bah, cut-n-pasto... the above should be correct patch.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #6 from Jakub Jelinek  ---
Simplified testcase:
/* PR target/80706 */
/* { dg-do run { target sse2_runtime } } */
/* { dg-options "-O2 -msse2" } */

union U { double value; struct S { int lsw; int msw; } parts; };

__attribute__((noinline, noclone)) double
foo (void)
{
  __asm volatile ("" : : : "memory");
  return 2.0;
}

__attribute__((noinline, noclone)) double
bar (void)
{
  double s = foo ();
  union U z;
  z.value = s;
  z.parts.lsw = 0;
  return z.value * z.value + s * s;
}

int
main ()
{
  if (bar () != 8.0)
__builtin_abort ();
  return 0;
}

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #5 from Uroš Bizjak  ---
--cut here--
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 20d46fe..d509be5 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -222,7 +222,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
&& rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
-  [(set (match_dup 3) (match_dup 5))]
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 3))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")

 (define_peephole2
@@ -235,7 +236,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
&& rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
-  [(set (match_dup 3) (match_dup 5))]
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 3))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")

 (define_expand "atomic_store"
--cut here--

Jakub, does the above patch fix the failure?

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

--- Comment #4 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #3)
> So peephole2 does here:
> - fstpl   24(%esp)
> - movq24(%esp), %xmm0
> - movq%xmm0, 32(%esp)
> + fstpl   32(%esp)
>   movl$0, 32(%esp)
> That is a nice simplification, but has one extra requirement not checked
> (and hard to check) in the peephole2 patterns - that the memory slot stored
> in the first store is a scratch memory not used afterwards (or overwritten
> first, i.e. dead).  While we have peep2_reg_dead_p predicates, we don't have
> peep2_mem_dead_p and implementing that would be hard, only DSE has
> infrastructure to do that, but dse2 is run before peephole2 pass.
> All we could do is simplify the mem[sp+24]=st; xmm0=[sp+24]; [sp+32]=xmm0;
> into mem[sp+24]=st; mem[sp+32]=st; and let the regstack pass figure out
> something with it - fstl 24(%esp); fstpl 32(%esp) ?).  DSE isn't run
> afterwards, so it would be nice to do that earlier though.

Let's keep the dangling store to a temporary here. We already loaded the value
from the memory, so one extra store won't hurt that much...

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

Jakub Jelinek  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
So peephole2 does here:
-   fstpl   24(%esp)
-   movq24(%esp), %xmm0
-   movq%xmm0, 32(%esp)
+   fstpl   32(%esp)
movl$0, 32(%esp)
That is a nice simplification, but has one extra requirement not checked (and
hard to check) in the peephole2 patterns - that the memory slot stored in the
first store is a scratch memory not used afterwards (or overwritten first, i.e.
dead).  While we have peep2_reg_dead_p predicates, we don't have
peep2_mem_dead_p and implementing that would be hard, only DSE has
infrastructure to do that, but dse2 is run before peephole2 pass.
All we could do is simplify the mem[sp+24]=st; xmm0=[sp+24]; [sp+32]=xmm0;
into mem[sp+24]=st; mem[sp+32]=st; and let the regstack pass figure out
something with it - fstl 24(%esp); fstpl 32(%esp) ?).  DSE isn't run
afterwards, so it would be nice to do that earlier though.

[Bug target/80706] [7/8 Regression] peephole2 uses uninitialized stack variables on i686

2017-05-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80706

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-05-11
 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |7.2
Summary|peephole2 uses  |[7/8 Regression] peephole2
   |uninitialized stack |uses uninitialized stack
   |variables on i686   |variables on i686
 Ever confirmed|0   |1

--- Comment #2 from Jakub Jelinek  ---
Started with r236863.