[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:c01ae2ab6b227e21835d128c90e974dce4604be9 commit r11-8027-gc01ae2ab6b227e21835d128c90e974dce4604be9 Author: Richard Biener Date: Wed Apr 7 13:17:05 2021 +0200 tree-optimization/99954 - fix loop distribution memcpy classification This fixes bogus classification of a copy as memcpy. We cannot use plain dependence analysis to decide between memcpy and memmove when it computes no dependence. Instead we have to try harder later which the patch does for the gcc.dg/tree-ssa/ldist-24.c testcase by resorting to tree-affine to compute the difference between src and dest and compare against the copy size. 2021-04-07 Richard Biener PR tree-optimization/99954 * tree-loop-distribution.c: Include tree-affine.h. (generate_memcpy_builtin): Try using tree-affine to prove non-overlap. (loop_distribution::classify_builtin_ldst): Always classify as PKIND_MEMMOVE. * gcc.dg/torture/pr99954.c: New testcase.
[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954 --- Comment #4 from Richard Biener --- OK, so (compute_affine_dependence stmt_a: _1 = MEM[(union container *)p_11 + -4B].value; stmt_b: p_11->value = _1; ) -> no dependence where there is indeed no depedence but it looks like this is the wrong question to be asked. So the cited rev. in question was "wrong" to honor the assertion from classify_builtin_ldst based on dependence analysis. diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 583bb062b76..2ec7d999e21 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1759,11 +1759,11 @@ loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg, /* Now check that if there is a dependence. */ ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr); - /* Classify as memcpy if no dependence between load and store. */ + /* Classify as memmove if no dependence between load and store. */ if (DDR_ARE_DEPENDENT (ddr) == chrec_known) { partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size); - partition->kind = PKIND_MEMCPY; + partition->kind = PKIND_MEMMOVE; return; } is the correct thing to do here. That leaves classifciation involving (constant) sizes on the plate we could check the dependence distance against or adjust the later alias check by creating ao_refs with size. In particular this FAILs the added testcase by said ref. Adjusting the alias check for the testcase isn't easy, the dest/src pointers look like (gdb) p debug_generic_expr (src) (double *) par.0_1 + ((sizetype) i_19 * 320 + 4160) $5 = void (gdb) p debug_generic_expr (dest) (double *) par.0_1 + ((sizetype) i_19 * 320 + 1600) $6 = void and there's no nice API do use here. We can use tree-affine to subtract the pointers and compare the difference with the computed size though.
[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target Milestone|--- |8.5 Status|NEW |ASSIGNED --- Comment #3 from Richard Biener --- I will have a look.
[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954 --- Comment #2 from Jonathan Wakely --- > using -O2 "fixes" the issue on GCC 9 and below but not on 10. And that changed with r271553 PR tree-optimization/88440 * opts.c (default_options_table): Enable -ftree-loop-distribute-patterns at -O[2s]+. * tree-loop-distribution.c (generate_memset_builtin): Fold the generated call. (generate_memcpy_builtin): Likewise. (distribute_loop): Pass in whether to only distribute patterns. (prepare_perfect_loop_nest): Also allow size optimization. (pass_loop_distribution::execute): When optimizing a loop nest for size allow pattern replacement.
[Bug tree-optimization/99954] [8/9/10/11 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-04-07 Known to work||6.5.0 Summary|Copy loop over array of |[8/9/10/11 Regression] Copy |unions at -O3 generates |loop over array of unions |memcpy instead of memmove, |at -O3 generates memcpy |resulting in incorrect code |instead of memmove, ||resulting in incorrect code Known to fail||10.2.0, 11.0, 7.5.0, 8.4.0, ||9.3.0 CC||rguenther at suse dot de Ever confirmed|0 |1 --- Comment #1 from Jonathan Wakely --- (In reply to Sergey Zakharchenko from comment #0) > As checked on godbolt.org, issue applies to GCC 7+ (6 and below correctly > generate memmove); Started with r242470 re PR tree-optimization/78348 ([7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038) 2016-11-16 Richard Biener PR tree-optimization/78348 * tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMMOVE. (generate_memcpy_builtin): Honor PKIND_MEMCPY on the partition. (classify_partition): Set PKIND_MEMCPY if dependence analysis revealed no dependency, PKIND_MEMMOVE otherwise. * gcc.dg/tree-ssa/ldist-24.c: New testcase.