[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2

2021-08-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #8 from Andrew Pinski  ---
(In reply to Mathias Stearn from comment #2)
> Hmm. Taking the example from the -ftree-loop-distribute-patterns
> documentation, it still seems to generate poor code, this time at both -O2
> and -O3: https://godbolt.org/g/EsQDj8
> 
> Why isn't that transformed to memset(A, 0, N); memset(B, 1, N); ? This feels
> similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85721. Should I make
> a new ticket with this example?

For this case (with the mentioned B[i] = A[i] + 1 fix), in GCC 10+ we started
to produce memset for one of the alias conditions.
-ftree-loop-distribute-patterns is turned on at -O2 for starting in GCC 10
also.

I should note clang, ICC nor MSVC is able to do this second loop to convert it
to two memset even with an alias check; others just vectorize the loop with an
aliasing check. clang and MSVC are able to detect the one in comment #0 and
convert it to memset though so I know they have this kind of optimization for
sure.

[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2

2018-05-25 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

--- Comment #7 from bin cheng  ---
Author: amker
Date: Fri May 25 11:09:42 2018
New Revision: 260753

URL: https://gcc.gnu.org/viewcvs?rev=260753=gcc=rev
Log:
PR tree-optimization/85720

* tree-loop-distribution.c (break_alias_scc_partitions): Don't merge
SCC if all partitions are builtins.
(version_loop_by_alias_check): New parameter.  Generate cancelable
runtime alias check if all partitions are builtins.
(distribute_loop): Update call to above function.

gcc/testsuite
* gcc.dg/tree-ssa/pr85720.c: New test.
* gcc.target/i386/avx256-unaligned-store-2.c: Disable loop pattern
distribution.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr85720.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-2.c
trunk/gcc/tree-loop-distribution.c

[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2

2018-05-15 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

--- Comment #6 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
> (In reply to Mathias Stearn from comment #4)
> > (In reply to Marc Glisse from comment #3)
> > > Again, you are ignoring aliasing issues (just like in your other PR the
> > > function copy isn't equivalent to memmove). Does adding __restrict change
> > > the result? Also, B[i]=B[i]+1 doesn't look like a memset...
> > 
> > Sorry, I typoed. It was supposed to be B[i] = A[i] + 1. That still does
> > basically the same thing though: https://godbolt.org/g/dtmU5t. Good point
> > about aliasing though. I guess the right code gen in that case would
> > actually be something that detected the overlap and did the right calls to
> > memset to only set each byte once. Or just do the simple thing:
> > 
> > if (b > a && b < a + n) {
> >   memset(b, 1, n);
> >   memset(a, 0, n);
> > } else {
> >   memset(a, 0, n);
> >   memset(b, 1, n);
> > }
> > 
> > Yes, __restrict helps, but that isn't part of standard c++, and it seems
> > like it never will be.
> 
> GCC supports it since forever...  but yes, currently loop distribution
> doesn't consider runtime aliasing in this case (it can do it now but
This should be easy to improve for builtin distribution cases.
> is keyed to vectorization).  It also doesn't have a way to distinguish
> forward vs. backward dependence and split the case like you propose.
IMHO, this isn't optimizer's responsibility.  If break-conditions for
forward/backward dependence need to be supported, it seems more reasonable to
be supported in dependence analysis.

[Bug tree-optimization/85720] bad codegen for looped assignment of primitives at -O2

2018-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85720

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 CC||amker at gcc dot gnu.org
  Component|middle-end  |tree-optimization

--- Comment #5 from Richard Biener  ---
(In reply to Mathias Stearn from comment #4)
> (In reply to Marc Glisse from comment #3)
> > Again, you are ignoring aliasing issues (just like in your other PR the
> > function copy isn't equivalent to memmove). Does adding __restrict change
> > the result? Also, B[i]=B[i]+1 doesn't look like a memset...
> 
> Sorry, I typoed. It was supposed to be B[i] = A[i] + 1. That still does
> basically the same thing though: https://godbolt.org/g/dtmU5t. Good point
> about aliasing though. I guess the right code gen in that case would
> actually be something that detected the overlap and did the right calls to
> memset to only set each byte once. Or just do the simple thing:
> 
> if (b > a && b < a + n) {
>   memset(b, 1, n);
>   memset(a, 0, n);
> } else {
>   memset(a, 0, n);
>   memset(b, 1, n);
> }
> 
> Yes, __restrict helps, but that isn't part of standard c++, and it seems
> like it never will be.

GCC supports it since forever...  but yes, currently loop distribution
doesn't consider runtime aliasing in this case (it can do it now but
is keyed to vectorization).  It also doesn't have a way to distinguish
forward vs. backward dependence and split the case like you propose.