[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #8 from Andrew Pinski --- (In reply to Mathias Stearn from comment #2) > Hmm. Taking the example from the -ftree-loop-distribute-patterns > documentation, it still seems to generate poor code, this time at both -O2 > and -O3: https://godbolt.org/g/EsQDj8 > > Why isn't that transformed to memset(A, 0, N); memset(B, 1, N); ? This feels > similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721. Should I make > a new ticket with this example? For this case (with the mentioned B[i] = A[i] + 1 fix), in GCC 10+ we started to produce memset for one of the alias conditions. -ftree-loop-distribute-patterns is turned on at -O2 for starting in GCC 10 also. I should note clang, ICC nor MSVC is able to do this second loop to convert it to two memset even with an alias check; others just vectorize the loop with an aliasing check. clang and MSVC are able to detect the one in comment #0 and convert it to memset though so I know they have this kind of optimization for sure.
[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720 --- Comment #7 from bin cheng --- Author: amker Date: Fri May 25 11:09:42 2018 New Revision: 260753 URL: https://gcc.gnu.org/viewcvs?rev=260753=gcc=rev Log: PR tree-optimization/85720 * tree-loop-distribution.c (break_alias_scc_partitions): Don't merge SCC if all partitions are builtins. (version_loop_by_alias_check): New parameter. Generate cancelable runtime alias check if all partitions are builtins. (distribute_loop): Update call to above function. gcc/testsuite * gcc.dg/tree-ssa/pr85720.c: New test. * gcc.target/i386/avx256-unaligned-store-2.c: Disable loop pattern distribution. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr85720.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-2.c trunk/gcc/tree-loop-distribution.c
[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720 --- Comment #6 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #5) > (In reply to Mathias Stearn from comment #4) > > (In reply to Marc Glisse from comment #3) > > > Again, you are ignoring aliasing issues (just like in your other PR the > > > function copy isn't equivalent to memmove). Does adding __restrict change > > > the result? Also, B[i]=B[i]+1 doesn't look like a memset... > > > > Sorry, I typoed. It was supposed to be B[i] = A[i] + 1. That still does > > basically the same thing though: https://godbolt.org/g/dtmU5t. Good point > > about aliasing though. I guess the right code gen in that case would > > actually be something that detected the overlap and did the right calls to > > memset to only set each byte once. Or just do the simple thing: > > > > if (b > a && b < a + n) { > > memset(b, 1, n); > > memset(a, 0, n); > > } else { > > memset(a, 0, n); > > memset(b, 1, n); > > } > > > > Yes, __restrict helps, but that isn't part of standard c++, and it seems > > like it never will be. > > GCC supports it since forever... but yes, currently loop distribution > doesn't consider runtime aliasing in this case (it can do it now but This should be easy to improve for builtin distribution cases. > is keyed to vectorization). It also doesn't have a way to distinguish > forward vs. backward dependence and split the case like you propose. IMHO, this isn't optimizer's responsibility. If break-conditions for forward/backward dependence need to be supported, it seems more reasonable to be supported in dependence analysis.
[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720 Richard Biener changed: What|Removed |Added Keywords||missed-optimization CC||amker at gcc dot gnu.org Component|middle-end |tree-optimization --- Comment #5 from Richard Biener --- (In reply to Mathias Stearn from comment #4) > (In reply to Marc Glisse from comment #3) > > Again, you are ignoring aliasing issues (just like in your other PR the > > function copy isn't equivalent to memmove). Does adding __restrict change > > the result? Also, B[i]=B[i]+1 doesn't look like a memset... > > Sorry, I typoed. It was supposed to be B[i] = A[i] + 1. That still does > basically the same thing though: https://godbolt.org/g/dtmU5t. Good point > about aliasing though. I guess the right code gen in that case would > actually be something that detected the overlap and did the right calls to > memset to only set each byte once. Or just do the simple thing: > > if (b > a && b < a + n) { > memset(b, 1, n); > memset(a, 0, n); > } else { > memset(a, 0, n); > memset(b, 1, n); > } > > Yes, __restrict helps, but that isn't part of standard c++, and it seems > like it never will be. GCC supports it since forever... but yes, currently loop distribution doesn't consider runtime aliasing in this case (it can do it now but is keyed to vectorization). It also doesn't have a way to distinguish forward vs. backward dependence and split the case like you propose.