[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:c01ae2ab6b227e21835d128c90e974dce4604be9

commit r11-8027-gc01ae2ab6b227e21835d128c90e974dce4604be9
Author: Richard Biener 
Date:   Wed Apr 7 13:17:05 2021 +0200

tree-optimization/99954 - fix loop distribution memcpy classification

This fixes bogus classification of a copy as memcpy.  We cannot use
plain dependence analysis to decide between memcpy and memmove when
it computes no dependence.  Instead we have to try harder later which
the patch does for the gcc.dg/tree-ssa/ldist-24.c testcase by resorting
to tree-affine to compute the difference between src and dest and
compare against the copy size.

2021-04-07  Richard Biener  

PR tree-optimization/99954
* tree-loop-distribution.c: Include tree-affine.h.
(generate_memcpy_builtin): Try using tree-affine to prove
non-overlap.
(loop_distribution::classify_builtin_ldst): Always classify
as PKIND_MEMMOVE.

* gcc.dg/torture/pr99954.c: New testcase.

[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

--- Comment #4 from Richard Biener  ---
OK, so

(compute_affine_dependence
  stmt_a: _1 = MEM[(union container *)p_11 + -4B].value;
  stmt_b: p_11->value = _1;
) -> no dependence

where there is indeed no depedence but it looks like this is the wrong
question to be asked.  So the cited rev. in question was "wrong" to
honor the assertion from classify_builtin_ldst based on dependence analysis.

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 583bb062b76..2ec7d999e21 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1759,11 +1759,11 @@ loop_distribution::classify_builtin_ldst (loop_p loop,
struct graph *rdg,
   /* Now check that if there is a dependence.  */
   ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr);

-  /* Classify as memcpy if no dependence between load and store.  */
+  /* Classify as memmove if no dependence between load and store.  */
   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
 {
   partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base,
size);
-  partition->kind = PKIND_MEMCPY;
+  partition->kind = PKIND_MEMMOVE;
   return;
 }


is the correct thing to do here.  That leaves classifciation involving
(constant) sizes on the plate we could check the dependence distance
against or adjust the later alias check by creating ao_refs with size.
In particular this FAILs the added testcase by said ref.

Adjusting the alias check for the testcase isn't easy, the dest/src
pointers look like

(gdb) p debug_generic_expr (src)
(double *) par.0_1 + ((sizetype) i_19 * 320 + 4160)
$5 = void
(gdb) p debug_generic_expr (dest)
(double *) par.0_1 + ((sizetype) i_19 * 320 + 1600)
$6 = void

and there's no nice API do use here.  We can use tree-affine to subtract
the pointers and compare the difference with the computed size though.

[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Target Milestone|--- |8.5
 Status|NEW |ASSIGNED

--- Comment #3 from Richard Biener  ---
I will have a look.

[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-07 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

--- Comment #2 from Jonathan Wakely  ---
> using -O2 "fixes" the issue on GCC 9 and below but not on 10.

And that changed with r271553

PR tree-optimization/88440
* opts.c (default_options_table): Enable
-ftree-loop-distribute-patterns
at -O[2s]+.
* tree-loop-distribution.c (generate_memset_builtin): Fold the
generated call.
(generate_memcpy_builtin): Likewise.
(distribute_loop): Pass in whether to only distribute patterns.
(prepare_perfect_loop_nest): Also allow size optimization.
(pass_loop_distribution::execute): When optimizing a loop
nest for size allow pattern replacement.

[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-07 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-04-07
  Known to work||6.5.0
Summary|Copy loop over array of |[8/9/10/11 Regression] Copy
   |unions at -O3 generates |loop over array of unions
   |memcpy instead of memmove,  |at -O3 generates memcpy
   |resulting in incorrect code |instead of memmove,
   ||resulting in incorrect code
  Known to fail||10.2.0, 11.0, 7.5.0, 8.4.0,
   ||9.3.0
 CC||rguenther at suse dot de
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
(In reply to Sergey Zakharchenko from comment #0)
> As checked on godbolt.org, issue applies to GCC 7+ (6 and below correctly
> generate memmove);

Started with r242470

re PR tree-optimization/78348 ([7 REGRESSION] 15% performance drop for
coremark-pro/nnet-test after r242038)

2016-11-16  Richard Biener

PR tree-optimization/78348
* tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMMOVE.
(generate_memcpy_builtin): Honor PKIND_MEMCPY on the partition.
(classify_partition): Set PKIND_MEMCPY if dependence analysis
revealed no dependency, PKIND_MEMMOVE otherwise.

* gcc.dg/tree-ssa/ldist-24.c: New testcase.