[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-08 Thread sandra at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #22 from sandra at gcc dot gnu.org ---
My nios2-elf test results look good now with this patch.  Thanks!

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-08 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #21 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:dd9ca9d770a18ce4b16d867f49fef3293b483ff5

commit r10-7635-gdd9ca9d770a18ce4b16d867f49fef3293b483ff5
Author: Richard Biener 
Date:   Wed Apr 8 14:04:35 2020 +0200

rtl-optimization/93946 - fix TBAA for redundant store removal in CSE

It turns out RTL CSE tries to remove redundant stores but fails to
do the usual validity check what such a change is TBAA neutral to
later loads.

This now triggers with the PR93946 testcases on nios2.

2020-04-08  Richard Biener  

PR rtl-optimization/93946
* cse.c (cse_insn): Record the tabled expression in
src_related.  Verify a redundant store removal is valid.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #20 from Richard Biener  ---
So for the CSE issue we go through the equivalence chain and find

(gdb) p debug_rtx (p->exp)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4
A32])

5076  if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
5077src_related = dest;
...
5121  if (rtx_equal_p (src_related, dest))
5122src_related_cost = src_related_regcost = -1;

the first issue is that we're recording dest as src_related here losing the
opportunity to do a validity check later on.  If we fix that we can do the
usual validity check, copied from DSE:

diff --git a/gcc/cse.c b/gcc/cse.c
index 3e8724b3fed..f07bbdbebad 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5074,7 +5074,7 @@ cse_insn (rtx_insn *insn)
 to prefer it.  Copy it to src_related.  The code below will
 then give it a negative cost.  */
  if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
-   src_related = dest;
+   src_related = p->exp;
}

   /* Find the cheapest valid equivalent, trying all the available
@@ -5332,7 +5332,16 @@ cse_insn (rtx_insn *insn)
   && rtx_equal_p (trial, dest)
   && !side_effects_p (dest)
   && (cfun->can_delete_dead_exceptions
-  || insn_nothrow_p (insn)))
+  || insn_nothrow_p (insn))
+  /* We can only remove the later store if the earlier aliases
+ at least all accesses the later one.  */
+  && (!MEM_P (trial)
+  || ((MEM_ALIAS_SET (dest) == MEM_ALIAS_SET (trial)
+   || alias_set_subset_of (MEM_ALIAS_SET (dest),
+   MEM_ALIAS_SET (trial)))
+   && (!MEM_EXPR (trial)
+   || refs_same_for_tbaa_p (MEM_EXPR (trial),
+MEM_EXPR (dest))
{
  SET_SRC (sets[i].rtl) = trial;
  noop_insn = true;

that gives us for the testcase

foo:
movir2, 1
stw r2, 0(r4)
stw zero, 0(r5)
ldw r2, 0(r4)
stw zero, 4(r5)
ret

which looks correct to me.  I'm going to test the above on x86_64-linux.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #19 from Richard Biener  ---
(In reply to sandra from comment #18)
> Hmmm, it looks to me like things are going wrong in the tree fre1 pass too. 
> I commented out the redundant zero store in the test case to see what would
> happen, like
> 
> long __attribute__((noipa))
> foo (struct bb *bv, void *ptr)
> {
>   struct aa *a = ptr;
>   struct bb *b = ptr;
>   bv->b.u.f = 1;
>   a->a.u.i = 0;
>   /*  b->b.u.f = 0; */
>   return bv->b.u.f;
> }
> 
> fre1 gets this as input:
> 
> __attribute__((noipa, noinline, noclone, no_icf))
> foo (struct bb * bv, void * ptr)
> {
>   struct bb * b;
>   struct aa * a;
>   long int _8;
> 
>:
>   bv_5(D)->b.u.f = 1;
>   MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
>   _8 = bv_5(D)->b.u.f;
>   return _8;
> 
> }
> 
> 
> and produces this output:
> 
> __attribute__((noipa, noinline, noclone, no_icf))
> foo (struct bb * bv, void * ptr)
> {
>   struct bb * b;
>   struct aa * a;
> 
>:
>   bv_5(D)->b.u.f = 1;
>   MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
>   return 1;
> 
> }
> 
> This is with -O2.
> 
> Again, it seems like something is not realizing that pointer parameters can
> be aliased no matter what their types are.  Or perhaps that is just a
> symptom of some other underlying bug.  :-S

No, it's a feature and called type-based alias-analysis.  The "redundant"
store is what eventually aliases with the load, the other store to the
same location can be disambiguated using TBAA.  So the second store is
not really redudnant - while it has no effect on the bits in the memory
it alters the memory state by changing the effective type of the memory
location to one compatible with the following load (or since we're dealing
with unions, it changes the active union member).  This is why we cannot
elide the "redudnant" store.  We can elide the earlier store as dead
though (since there's no intermediate use).

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-07 Thread sandra at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #18 from sandra at gcc dot gnu.org ---
Hmmm, it looks to me like things are going wrong in the tree fre1 pass too.  I
commented out the redundant zero store in the test case to see what would
happen, like

long __attribute__((noipa))
foo (struct bb *bv, void *ptr)
{
  struct aa *a = ptr;
  struct bb *b = ptr;
  bv->b.u.f = 1;
  a->a.u.i = 0;
  /*  b->b.u.f = 0; */
  return bv->b.u.f;
}

fre1 gets this as input:

__attribute__((noipa, noinline, noclone, no_icf))
foo (struct bb * bv, void * ptr)
{
  struct bb * b;
  struct aa * a;
  long int _8;

   :
  bv_5(D)->b.u.f = 1;
  MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
  _8 = bv_5(D)->b.u.f;
  return _8;

}


and produces this output:

__attribute__((noipa, noinline, noclone, no_icf))
foo (struct bb * bv, void * ptr)
{
  struct bb * b;
  struct aa * a;

   :
  bv_5(D)->b.u.f = 1;
  MEM[(struct aa *)ptr_1(D)].a.u.i = 0;
  return 1;

}

This is with -O2.

Again, it seems like something is not realizing that pointer parameters can be
aliased no matter what their types are.  Or perhaps that is just a symptom of
some other underlying bug.  :-S

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #17 from Richard Biener  ---
x86 splits large stores only after reload it seems.  Also on x86 GIMPLE
store-merging triggers, merging the two = 0 stores so I need -fno-store-merging
to
even get two stores there.  Still the issue in CSE is obviously there...

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #16 from Richard Biener  ---
OK, now looking myself.  RTL expansion creates

(insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4
A32])
(reg:SI 49)) "t.c":12:13 -1
 (nil))
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
(const_int 0 [0])) "t.c":13:12 -1
 (nil))
(insn 10 9 11 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ])
(const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
(const_int 0 [0])) "t.c":13:12 -1
 (nil))
(insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb
*)ptr_1(D)].b.u.f+0 S4 A32])
(const_int 0 [0])) "t.c":14:12 -1
 (nil))
(insn 12 11 13 2 (set (reg:SI 51)
(mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]))
"t.c":15:17 -1
 (nil))

where insn 11 is the important one.  Somehow on nios2 the CSE1 removes that
store.

deferring deletion of insn with uid = 11.

and we end up with

(insn 8 7 9 2 (set (mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4
A32])
(reg:SI 49)) "t.c":12:13 5 {movsi_internal}
 (expr_list:REG_DEAD (reg:SI 49)
(nil)))
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
(const_int 0 [0])) "t.c":13:12 5 {movsi_internal}
 (nil))
(insn 10 9 12 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 48 [ ptr ])
(const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
(const_int 0 [0])) "t.c":13:12 5 {movsi_internal}
 (nil))
(insn 12 10 13 2 (set (reg:SI 51 [ bv_3(D)->b.u.f ])
(mem/j:SI (reg/v/f:SI 47 [ bv ]) [1 bv_3(D)->b.u.f+0 S4 A32]))
"t.c":15:17 5 {movsi_internal}
 (expr_list:REG_DEAD (reg/v/f:SI 47 [ bv ])
(nil)))

where there indeed is no scheduling barrier anymore.

I didn't know CSE removes stores or why this only triggers on nios2, it looks
like some DF thing?  Backtrace of the "DSE":

#0  delete_insn (insn=0x76bc3400)
at /space/rguenther/src/gcc/gcc/cfgrtl.c:135
#1  0x00b0bfa5 in delete_insn_and_edges (insn=0x76bc3400)
at /space/rguenther/src/gcc/gcc/cfgrtl.c:237
#2  0x01a9d8eb in cse_insn (insn=0x76bc3400)
at /space/rguenther/src/gcc/gcc/cse.c:5571
#3  0x01aa0b76 in cse_extended_basic_block (ebb_data=0x7fffdc90)
at /space/rguenther/src/gcc/gcc/cse.c:6614
#4  0x01aa10a5 in cse_main (f=0x76cce310, nregs=52)
at /space/rguenther/src/gcc/gcc/cse.c:6793

that's

  /* Similarly for no-op moves.  */
  else if (noop_insn)
{
  if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
cse_cfg_altered = true;
  cse_cfg_altered |= delete_insn_and_edges (insn);
  /* No more processing for this set.  */
  sets[i].rtl = 0;

so appearantly it does redundant store removal as well...

  /* Similarly, lots of targets don't allow no-op
 (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
 might be impossible for certain registers (like CC registers).  */
  else if (n_sets == 1
   && !CALL_P (insn)
   && (MEM_P (trial) || REG_P (trial))
   && rtx_equal_p (trial, dest)
   && !side_effects_p (dest)
   && (cfun->can_delete_dead_exceptions
   || insn_nothrow_p (insn)))
{
  SET_SRC (sets[i].rtl) = trial;
  noop_insn = true;
  break;
}

where

(gdb) p debug_rtx (insn)
(insn 11 10 12 2 (set (mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb
*)ptr_1(D)].b.u.f+0 S4 A32])
(const_int 0 [0])) "t.c":14:12 5 {movsi_internal}
 (expr_list:REG_DEAD (reg/v/f:SI 48 [ ptr ])
(nil)))
(gdb) p debug_rtx (trial)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4
A32])
$4 = void
(gdb) p debug_rtx (dest)
(mem/j:SI (reg/v/f:SI 48 [ ptr ]) [1 MEM[(struct bb *)ptr_1(D)].b.u.f+0 S4
A32])
$6 = void

so it might be that the trigger is a target where sizeof(long long) = 2 *
sizeof(long) _and_ we split stores to the larger type
(I tried to pick a set of types where sizeof is the same but
alias-sets are different - otherwise I'd have to cater for big vs.
little-endian).

[Bug tree-optimization/93946] Bogus redundant store removal

2020-04-05 Thread sandra at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #15 from sandra at gcc dot gnu.org ---
Hmmm.  I've gone over this code 2 or 3 times now, and I'm still convinced the
problem is in the alias analysis, not the scheduler.

I've stepped deeper into the code in the debugger, and here is the backtrace
from where I see things going wrong:

#0  indirect_refs_may_alias_p (ref1=0x774196f0, base1=0x773fec08, 
offset1=..., max_size1=..., size1=..., ref1_alias_set=1, 
base1_alias_set=4, ref2=0x774195d0, base2=0x773febb8, offset2=..., 
max_size2=..., size2=..., ref2_alias_set=1, base2_alias_set=6, tbaa_p=true)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2122
#1  0x013f8266 in refs_may_alias_p_2 (ref1=0x7fffd7d0, 
ref2=0x7fffd790, tbaa_p=true)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2320
#2  0x013f82bb in refs_may_alias_p_1 (ref1=0x7fffd7d0, 
ref2=0x7fffd790, tbaa_p=true)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/tree-ssa-alias.c:2339
#3  0x00b90b06 in rtx_refs_may_alias_p (x=0x7742cac8, 
mem=0x7742c9a8, tbaa_p=true)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:365
#4  0x00b981de in true_dependence_1 (mem=0x7742c9a8, 
mem_mode=E_SImode, mem_addr=0x7742c7e0, x=0x7742cac8, 
x_addr=0x7742c798, mem_canonicalized=false)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3048

The code here says

  /* Do type-based disambiguation.  */
  if (base1_alias_set != base2_alias_set
  && !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
return false;

and the "false" return status gets propagated all the way back up to
true_dependence.

It seems to me that what is going wrong is that it is failing to consider that
two pointer parameters can be aliased no matter what their declared type is,
and no matter what types they are cast to -- e.g. because they point to members
of the same union.  Should ao_ref_base_alias_set be putting everything based on
pointer parameters without restrict semantics into the same alias set, maybe? 
Or should there be some code in indirect_refs_may_alias_p to look for this
situation before it reaches the point of type-based disambiguation where it is
currently failing to DTRT?

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #14 from Richard Biener  ---
(In reply to sandra from comment #13)
> Well, no.  The problem is that the scheduler is moving 
> 
> ldw r2, 0(r4)
> 
> ahead of
> 
> stw zero, 0(r5)
> 
> which is incorrect because the pointers in r4 and r5 are aliases.

Ah, so the scheduler needs to call anti_dependence (WAR).

> So at the point of call to true_dependence, I see:
> 
> (gdb) frame 1
> #1  0x01d1a108 in sched_analyze_2 (deps=0x7fffdd50, 
> x=0x7742cac8, insn=0x77315600)
> at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/sched-deps.c:2663
> 2663if (true_dependence (pending_mem->element (),
> VOIDmode, t)
> (gdb) print debug_rtx(insn)
> (insn 17 10 18 2 (set (reg/i:SI 2 r2)
> (mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0
> S4 A32]))
> "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/
> pr93946-1.c":18:1 5 {movsi_internal}
>  (expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:47 bv ] [47])
> (nil)))
> $3 = void
> (gdb) print debug_rtx(pending->insn())
> (insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1
> MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32])
> (const_int 0 [0]))
> "/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/
> pr93946-1.c":15:12 5 {movsi_internal}
>  (nil))
> $4 = void

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-28 Thread sandra at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #13 from sandra at gcc dot gnu.org ---
Well, no.  The problem is that the scheduler is moving 

ldw r2, 0(r4)

ahead of

stw zero, 0(r5)

which is incorrect because the pointers in r4 and r5 are aliases.

So at the point of call to true_dependence, I see:

(gdb) frame 1
#1  0x01d1a108 in sched_analyze_2 (deps=0x7fffdd50, 
x=0x7742cac8, insn=0x77315600)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/sched-deps.c:2663
2663if (true_dependence (pending_mem->element (), VOIDmode,
t)
(gdb) print debug_rtx(insn)
(insn 17 10 18 2 (set (reg/i:SI 2 r2)
(mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4
A32]))
"/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/pr93946-1.c":18:1
5 {movsi_internal}
 (expr_list:REG_DEAD (reg/v/f:SI 4 r4 [orig:47 bv ] [47])
(nil)))
$3 = void
(gdb) print debug_rtx(pending->insn())
(insn 9 8 10 2 (set (mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1
MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32])
(const_int 0 [0]))
"/scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12
5 {movsi_internal}
 (nil))
$4 = void

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-28 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #12 from rguenther at suse dot de  ---
On March 27, 2020 9:19:33 PM GMT+01:00, "sandra at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946
>
>--- Comment #11 from sandra at gcc dot gnu.org ---
>RTL before sched2 does look sane and similar to that generated for x86
>with
>-m32.
>
>I've been trying to step through sched2.  I think that where things get
>interesting is the call to true_dependence at sched-deps.c:2663.
>
>Breakpoint 6, true_dependence (mem=0x7742c9a8, mem_mode=E_VOIDmode,
>
>x=0x7742cac8)
>at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3056
>3056  return true_dependence_1 (mem, mem_mode, NULL_RTX,
>(gdb) print debug_rtx(mem)
>(mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa
>*)ptr_1(D)].a.u.i+0 S4 A32])
>$19 = void
>(gdb) print debug_rtx(x)
>(mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4
>A32])
>$20 = void
>
>This is making it all the way to the end of true_dependence_1, into
>rtx_refs_may_alias_p, and into refs_may_alias_p_1, which is returning
>false,
>which gets propagated back up as the result of true_dependence.  IIUC,
>this is
>what is allowing sched2 to move the read from "x" ahead of the write to
>"mem".
>
>Before I spend more time on this, am I on the right track here?  And is
>this
>pointing at the problem being in refs_may_alias_p_1 rather than
>somewhere along
>the way e.g. in true_dependence_1?

Those are two stores, right? Then true_dependence is the wrong predicate to
look at it should be output_dependence instead.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-27 Thread sandra at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #11 from sandra at gcc dot gnu.org ---
RTL before sched2 does look sane and similar to that generated for x86 with
-m32.

I've been trying to step through sched2.  I think that where things get
interesting is the call to true_dependence at sched-deps.c:2663.

Breakpoint 6, true_dependence (mem=0x7742c9a8, mem_mode=E_VOIDmode, 
x=0x7742cac8)
at /scratch/sandra/nios2-elf-fsf/src/gcc-mainline/gcc/alias.c:3056
3056  return true_dependence_1 (mem, mem_mode, NULL_RTX,
(gdb) print debug_rtx(mem)
(mem/j:SI (reg/v/f:SI 5 r5 [orig:48 ptr ] [48]) [1 MEM[(struct aa
*)ptr_1(D)].a.u.i+0 S4 A32])
$19 = void
(gdb) print debug_rtx(x)
(mem/j:SI (reg/v/f:SI 4 r4 [orig:47 bv ] [47]) [1 bv_3(D)->b.u.f+0 S4 A32])
$20 = void

This is making it all the way to the end of true_dependence_1, into
rtx_refs_may_alias_p, and into refs_may_alias_p_1, which is returning false,
which gets propagated back up as the result of true_dependence.  IIUC, this is
what is allowing sched2 to move the read from "x" ahead of the write to "mem".

Before I spend more time on this, am I on the right track here?  And is this
pointing at the problem being in refs_may_alias_p_1 rather than somewhere along
the way e.g. in true_dependence_1?

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #10 from Richard Biener  ---
(In reply to sandra from comment #9)
> Both the new test cases are failing on nios2 at -Os, -O2, and -O3.  I've
> done some analysis, but I'm not sure exactly where the problem lies, and
> whether this is a problem in the nios2 back end or somewhere else.
> 
> long __attribute__((noipa))
> foo (struct bb *bv, void *ptr)
> {
>   struct aa *a = ptr;
>   struct bb *b = ptr;
>   bv->b.u.f = 1;
>   a->a.u.i = 0;
>   b->b.u.f = 0;
>   return bv->b.u.f;
> }
> 
> is compiling to
> 
> foo:
> movir2, 1
> stw r2, 0(r4)
> ldw r2, 0(r4)
> stw zero, 0(r5)
> stw zero, 4(r5)
> ret
> 
> What's going on here is that load instructions have 3-cycle latency on
> nios2, so the sched2 pass is moving the "ldw r2, 0(r4)" to load the return
> value 2 instructions earlier  ahead of the store instruction to the same
> location via the aliased pointer.  :-(
> 
> I'm not an expert on the instruction scheduler, and it seems like the target
> hooks and machine description syntax are all focused on modelling pipeline
> costs in order to minimize stalls, not telling the scheduler that certain
> instructions cannot be correctly reordered at all.  Should some other pass
> be inserting optimization barriers, or something like that?  I feel like I'm
> missing some big-picture expertise of where this needs to be fixed, so any
> suggestions to point me in the right direction would be appreciated.

The instruction scheduler needs to check dependences which should correctly
model the constraints here already.  So you need to start looking at the
RTL before sched2 to see if that's sane (compare to say x86 RTL and look
for obvious signs of errors - some target expanders made errors here in the
past).  Then debug sched2 as to what true/anti/output_dependence tests it
performs and why those tell sched2 moving is OK.

As you can see the fix involved touching many passes so no doubt there can
be more issues elsewhere ...

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-24 Thread sandra at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

sandra at gcc dot gnu.org changed:

   What|Removed |Added

 CC||sandra at gcc dot gnu.org

--- Comment #9 from sandra at gcc dot gnu.org ---
Both the new test cases are failing on nios2 at -Os, -O2, and -O3.  I've done
some analysis, but I'm not sure exactly where the problem lies, and whether
this is a problem in the nios2 back end or somewhere else.

long __attribute__((noipa))
foo (struct bb *bv, void *ptr)
{
  struct aa *a = ptr;
  struct bb *b = ptr;
  bv->b.u.f = 1;
  a->a.u.i = 0;
  b->b.u.f = 0;
  return bv->b.u.f;
}

is compiling to

foo:
movir2, 1
stw r2, 0(r4)
ldw r2, 0(r4)
stw zero, 0(r5)
stw zero, 4(r5)
ret

What's going on here is that load instructions have 3-cycle latency on nios2,
so the sched2 pass is moving the "ldw r2, 0(r4)" to load the return value 2
instructions earlier  ahead of the store instruction to the same location
via the aliased pointer.  :-(

I'm not an expert on the instruction scheduler, and it seems like the target
hooks and machine description syntax are all focused on modelling pipeline
costs in order to minimize stalls, not telling the scheduler that certain
instructions cannot be correctly reordered at all.  Should some other pass be
inserting optimization barriers, or something like that?  I feel like I'm
missing some big-picture expertise of where this needs to be fixed, so any
suggestions to point me in the right direction would be appreciated.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|ASSIGNED|RESOLVED
  Known to work||10.0
 Resolution|--- |FIXED
  Known to fail|10.0|

--- Comment #8 from Richard Biener  ---
Fixed on trunk.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-03 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Richard Guenther :

https://gcc.gnu.org/g:3d6fd7ce6dc4b6baa11920387d62dc001980aa70

commit r10-6993-g3d6fd7ce6dc4b6baa11920387d62dc001980aa70
Author: Richard Biener 
Date:   Tue Mar 3 11:01:09 2020 +0100

tree-optimization/93946 - fix bogus redundant store removal in FRE, DSE and
DOM

This fixes a common mistake in removing a store that looks redudnant but
is not because it changes the dynamic type of the memory and thus makes
a difference for following loads with TBAA.

2020-03-03  Richard Biener  

PR tree-optimization/93946
* alias.h (refs_same_for_tbaa_p): Declare.
* alias.c (refs_same_for_tbaa_p): New function.
* tree-ssa-alias.c (ao_ref_alias_set): For a NULL ref return
zero.
* tree-ssa-scopedtables.h
(avail_exprs_stack::lookup_avail_expr): Add output argument
giving access to the hashtable entry.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr):
Likewise.
* tree-ssa-dom.c: Include alias.h.
(dom_opt_dom_walker::optimize_stmt): Validate TBAA state before
removing redundant store.
* tree-ssa-sccvn.h (vn_reference_s::base_set): New member.
(ao_ref_init_from_vn_reference): Adjust prototype.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_insert_pieces): Likewise.
* tree-ssa-sccvn.c: Track base alias set in addition to alias
set everywhere.
(eliminate_dom_walker::eliminate_stmt): Also check base alias
set when removing redundant stores.
(visit_reference_op_store): Likewise.
* dse.c (record_store): Adjust valdity check for redundant
store removal.

* gcc.dg/torture/pr93946-1.c: New testcase.
* gcc.dg/torture/pr93946-2.c: Likewise.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #6 from Richard Biener  ---
Created attachment 47948
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47948=edit
DSE part

I'm now testing the combined (and will squash if that succeeds).

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

Richard Biener  changed:

   What|Removed |Added

  Attachment #47922|0   |1
is obsolete||

--- Comment #5 from Richard Biener  ---
Created attachment 47947
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47947=edit
patch w/o RTL DSE

[Bug tree-optimization/93946] Bogus redundant store removal

2020-03-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #4 from Richard Biener  ---
OK, so the fix works but on x86_64 with -m32 RTL opts (sched2) end up wrecking
things...

foo:
.LFB0:
.cfi_startproc
movl4(%esp), %eax
movl8(%esp), %edx
movl$1, (%eax)
movl(%eax), %eax
movl$0, (%edx)
movl$0, 4(%edx)
ret

(insn:TI 7 18 10 2 (set (mem/j:SI (reg/v/f:SI 0 ax [orig:83 bv ] [83]) [1
bv_3(D)->b.u.f+0 S4 A32])
(const_int 1 [0x1]))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":14:13 67
{*movsi_internal}
 (nil))
(insn 18 7 20 2 (set (reg/v/f:SI 1 dx [orig:84 ptr ] [84])
(mem/f/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 8 [0x8])) [9 ptr+0 S4 A32]))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 67
{*movsi_internal}
 (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 16 argp)
(const_int 4 [0x4])) [9 ptr+0 S4 A32])
(nil)))
(insn:TI 20 10 21 2 (set (mem/j:SI (reg/v/f:SI 1 dx [orig:84 ptr ] [84]) [1
MEM[(struct aa *)ptr_1(D)].a.u.i+0 S4 A32])
(const_int 0 [0]))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 67
{*movsi_internal}
 (nil))
(insn:TI 21 20 16 2 (set (mem/j:SI (plus:SI (reg/v/f:SI 1 dx [orig:84 ptr ]
[84])
(const_int 4 [0x4])) [1 MEM[(struct aa *)ptr_1(D)].a.u.i+4 S4
A32])
(const_int 0 [0]))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":15:12 67
{*movsi_internal}
 (expr_list:REG_DEAD (reg/v/f:SI 1 dx [orig:84 ptr ] [84])
(nil)))
(insn 10 7 20 2 (set (reg:SI 0 ax [orig:86 bv_3(D)->b.u.f ] [86])
(mem/j:SI (reg/v/f:SI 0 ax [orig:83 bv ] [83]) [1 bv_3(D)->b.u.f+0 S4
A32]))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/torture/pr93946-1.c":17:17 67
{*movsi_internal}
 (nil))

Ah, something dropped the store to b.u.f and split the store to i.  DSE.
Obviously.

pr93946-1.c.263r.dse1: processing cselib load against insn 9
pr93946-1.c.263r.dse1:Locally deleting insn 9 because insn 8 stores the same
value and couldn't be eliminated
pr93946-1.c.263r.dse1:Locally deleting insn 9

indeed I remember touching DSE back in time.  It still only does

  /* We can only remove the later store if the earlier aliases
 at least all accesses the later one.  */
  && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
  || alias_set_subset_of (MEM_ALIAS_SET (mem),
  MEM_ALIAS_SET (s_info->mem

not considering the base set of the other store.  Now, we could allow
the DSE by clearing MEM_EXPR here or adjusting MEM_ALIAS_SET.  But not
sure if we want that?

[Bug tree-optimization/93946] Bogus redundant store removal

2020-02-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

--- Comment #3 from Richard Biener  ---
Created attachment 47922
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47922=edit
patch I am testing

This is what I have now.  Got to factor out that alias set.  In VN consider
passing ao_ref everywhere, the order of set, base-set is easy to get wrong
since there's no type safety.

[Bug tree-optimization/93946] Bogus redundant store removal

2020-02-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

Richard Biener  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
Ick.  And DSEs "new" redundant store removal code is completely buggy in this
regard as well...  adding noipa to foo makes the testcase still fail when FRE
is fixed.  So fixing that as well... (and then there's still DOM as well,
with -fno-tree-dse which would delete the dead store).

[Bug tree-optimization/93946] Bogus redundant store removal

2020-02-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93946

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-02-26
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
  Known to fail||10.0, 4.8.5, 7.5.0, 9.2.0

--- Comment #1 from Richard Biener  ---
Mine.