[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2020-05-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859
Bug 84859 depends on bug 33315, which changed state.

Bug 33315 Summary: stores not commoned by sinking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Richard Biener  ---
Fixed.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #12 from Richard Biener  ---
Author: rguenth
Date: Mon Mar 19 14:08:58 2018
New Revision: 258645

URL: https://gcc.gnu.org/viewcvs?rev=258645=gcc=rev
Log:
2018-03-19  Richard Biener  

PR tree-optimization/84859
* tree-ssa-phiopt.c (single_trailing_store_in_bb): New function.
(cond_if_else_store_replacement): Perform sinking operation on
single-store BBs regardless of MAX_STORES_TO_SINK setting.
Generalize what a BB with a single eligible store is.

* gcc.dg/tree-ssa/pr84859.c: New testcase.
* gcc.dg/tree-ssa/pr35286.c: Disable cselim.
* gcc.dg/tree-ssa/split-path-6.c: Likewise.
* gcc.dg/tree-ssa/split-path-7.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr84859.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c
trunk/gcc/tree-ssa-phiopt.c

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #10 from Richard Biener  ---
Created attachment 43674
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43674=edit
patch

Final patch I am testing.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #9 from Richard Biener  ---
Like moving over a const call after the stores might cause us to spill across
the call.  Moving over any stmt could enlarge lifetimes enough to do that. 
Register
lifetime could be so that we cannot coalesce the copies in the PHI (but that
probably is offset by removing one of the stores -- but maybe the fast path
is slowed down then).

value_replacement has /* Allow up to 2 cheap preparation statements that
prepare argument ... */ for example.  But I guess preparation stmts are never
of a concern?

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #8 from Richard Biener  ---
Mine.  We can handle the case of a single store in the BBs more generally and
bypass the limit.  The limit was added in r171381 where it was imposed on
all transforms when the function was changed to handle the case of more
than one stmt (store) in each BB.  Previously the transform for the single
stores case was always performed.

Thus, for example like the following.  Could be simplified somewhat iff
the stores need to be the last stmt in the BBs (like it was before said
re-org).  last_and_only_stmt isn't enough here as we have a feeding
scalar stmt around in one BB.

Opinions?

Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   (revision 258552)
+++ gcc/tree-ssa-phiopt.c   (working copy)
@@ -2061,8 +2061,6 @@ static bool
 cond_if_else_store_replacement (basic_block then_bb, basic_block else_bb,
 basic_block join_bb)
 {
-  gimple *then_assign = last_and_only_stmt (then_bb);
-  gimple *else_assign = last_and_only_stmt (else_bb);
   vec then_datarefs, else_datarefs;
   vec then_ddrs, else_ddrs;
   gimple *then_store, *else_store;
@@ -2073,14 +2071,49 @@ cond_if_else_store_replacement (basic_bl
   tree then_lhs, else_lhs;
   basic_block blocks[3];

-  if (MAX_STORES_TO_SINK == 0)
+  /* Handle the case with single store in THEN_BB and ELSE_BB.  That is
+ cheap enough to always handle.  */
+  gphi *vphi = NULL;
+  for (gphi_iterator si = gsi_start_phis (join_bb); !gsi_end_p (si);
+   gsi_next ())
+if (virtual_operand_p (gimple_phi_result (si.phi (
+  {
+   vphi = si.phi ();
+   break;
+  }
+  if (!vphi)
 return false;
+  gimple *then_assign = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE
+   (vphi, single_succ_edge
(then_bb)));
+  gimple *else_assign = SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE
+   (vphi, single_succ_edge
(else_bb)));
+
+  /* Verify there are no other stores or loads in the BBs.  That allows us
+ to elide dependence checking.  */
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (then_assign))
+if (USE_STMT (use_p) != then_assign
+   && gimple_bb (USE_STMT (use_p)) == then_bb)
+  {
+   then_assign = NULL;
+   break;
+  }
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (else_assign))
+if (USE_STMT (use_p) != else_assign
+   && gimple_bb (USE_STMT (use_p)) == else_bb)
+  {
+   else_assign = NULL;
+   break;
+  }

-  /* Handle the case with single statement in THEN_BB and ELSE_BB.  */
   if (then_assign && else_assign)
 return cond_if_else_store_replacement_1 (then_bb, else_bb, join_bb,
  then_assign, else_assign);

+  if (MAX_STORES_TO_SINK == 0)
+return false;
+
   /* Find data references.  */
   then_datarefs.create (1);
   else_datarefs.create (1);

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

Richard Biener  changed:

   What|Removed |Added

 Depends on||33315

--- Comment #7 from Richard Biener  ---
The patch for PR33315 manages to transform

  if (n_21 > 255)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 477815112]:
  a[0] = 255;
  goto ; [100.00%]

   [local count: 477815112]:
  _1 = (unsigned char) n_21;
  a[0] = _1;

   [local count: 955630225]:
  _2 = a[0];

to

  if (n_21 > 255)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 477815112]:
  _1 = (unsigned char) n_21;

   [local count: 955630225]:
  # _23 = PHI <255(5), _1(6)>
  a[0] = _23;
  _2 = a[0];

but as proposed that would happen too late (we've already threaded the jump
and PRE already would have done sth similar via the load from a[0]).  As
given it looks like sinking should be performed before the first phiopt pass
(which also runs quite late - there was talk of an early phiopt pass).
Possibly cselim could be enhanced to catch the above as well.  In fact its
comments say it would handle the above.  Ah, we do

  /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
 is disabled.  */
  if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
   || !opts->x_flag_tree_loop_if_convert)
maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
   opts->x_param_values, opts_set->x_param_values);

for whatever reason :/  --param max-stores-to-sink=2 fixes the testcase.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315
[Bug 33315] stores not commoned by sinking

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-14 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #6 from Martin Sebor  ---
We have seen this sort of thing come up a number of times in the past.  Jeff
and I have discussed changing jump threading to either avoid introducing paths
involving invalid calls, or inserting __builtin_unreachable/__builtin_trap. 
That could not only avoid false positives but also make it possible to emit
more efficient code.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #5 from Jakub Jelinek  ---
Seems the kernel testcase is actually more like:
void f (void*);

void __attribute__ ((noinline, noclone))
g (const void *p, unsigned n)
{
  unsigned char a[8];
  if (n > sizeof a - 1)
return;

  for (; n > 0; n -= *a)
  {
 *a = n > 255 ? 255 : n;

__builtin_memcpy (a + 1, p, *a);   // no warning (good)
f (a);
  }
}

void __attribute__ ((noinline, noclone))
h (const void *p, unsigned n)
{
  unsigned char a[8];
  if (n > sizeof a - 1)
return;

  for (; n > 0; n -= *a)
  {
if (n > 255)
  *a = 255;
else
  *a = n;

__builtin_memcpy (a + 1, p, *a);   // bogus -Warray-bounds
f (a);
  }
}

where the memcpy calls don't clobber *a, but there is another call that makes a
escape and could modify the value, so I'm afraid there is absolutely nothing
VRP can do here and the only fix is not to warn on statements affected by jump
threading.

What the kernel should have used (if it couldn't use get rid of the loop
altogether like it did) would be something like:
void f (void*);

void __attribute__ ((noinline, noclone))
g (const void *p, unsigned n)
{
  unsigned char a[8], t;
  if (n > sizeof a - 1)
return;

  for (; n > 0; n -= t)
  {
 t = n > 255 ? 255 : n;
 *a = t;

__builtin_memcpy (a + 1, p, t);   // no warning (good)
f (a);
  }
}

void __attribute__ ((noinline, noclone))
h (const void *p, unsigned n)
{
  unsigned char a[8], t;
  if (n > sizeof a - 1)
return;

  for (; n > 0; n -= t)
  {
if (n > 255)
  t = 255;
else
  t = n;
*a = t;

__builtin_memcpy (a + 1, p, t);   // no warning (good)
f (a);
  }
}

i.e. introduce a temporary for the size in the current iteration and use that
across the calls that might have but don't really change that.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-14 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #3 from Martin Sebor  ---
The warning is in response to the call to memcpy(d, s, 255) in h():

   [local count: 477815112]:
  a[0] = 255;
  __builtin_memcpy (, p_15(D), 255);
  _26 = a[0];

The wording seems clear:  The function is forming an offset that is out of the
bounds of the referenced object.

The range of the out-of-bounds offset is [9, 255], while the bounds of the
object are [0, 8].

If you think you have of a better/clearer way to phrase it then please propose
it.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

--- Comment #4 from Jakub Jelinek  ---
Note, even teaching VRP to look through stores and loads from memory wouldn't
help here, not even for the first function, because it uses *a as the length,
but also overwrites it (potentially or for real) with the memcpy, so the
compiler can't really know anything about *a in the second and following
iterations, except that it is unsigned char and thus [0, 255].  n can wrap
around.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
That is just something gimple-ssa-warn-restrict.c mades up from the call with
255 constant size.  Figuring out from the warning wording that it complains
about a call with constant length 255 is impossible.

[Bug tree-optimization/84859] [8 Regression] bogus -Warray-bounds on a memcpy in a loop

2018-03-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84859

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-03-14
  Known to work||7.3.1
   Target Milestone|--- |8.0
Summary|bogus -Warray-bounds on a   |[8 Regression] bogus
   |memcpy in a loop|-Warray-bounds on a memcpy
   ||in a loop
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
The reason is a missed phiopt to MIN_EXPR for h and VRP not propagating through
memory.

we have

   [local count: 955630223]:
  if (n_6 > 255)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 477815112]:
  a[0] = 255;
  goto ; [100.00%]

   [local count: 477815112]:
  _1 = (unsigned char) n_6;
  a[0] = _1;

   [local count: 955630223]:
  _2 = a[0];

where we lack a pass replacing that with

   
   # tem = PHI <255(5), _1(6)>
   a[0] = tem;
   _2 = a[0];

and then a MIN_EXPR.

Not sure why thw warn_restrict pass ends up with this kind of value-range
here though.  In the IL we have threaded the n > 255 check though and
ended up with an explicit memcpy (..., 255);  but that's not [9, 255].