[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?

2021-12-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?

2020-01-21 Thread nathanael.schaeffer at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334

--- Comment #5 from N Schaeffer  ---
Elaborating a bit on this:

I can eliminate this problem by using:
   -O3 -fno-tree-loop-distribute-patterns -fno-tree-loop-vectorize

I wonder why -fno-tree-loop-distribute-patterns is not enough ?
In that case, I get no calls to memset, but still the write-after-write
dependency check.

Also, decorating the loop with
   #pragma omp simd
AND compiling with
   -O3  -march=core-avx2  -fopenmp-simd  -fno-tree-loop-distribute-patterns
finally generates sensible code.

Note that with -fno-tree-loop-distribute-patterns, I still get calls to memset
instead of a simd-vectorized loop...

[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?

2020-01-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
(In reply to N Schaeffer from comment #3)
> So I think only zeros and NaNs are possible to optimize to memset anyway
> (and some other very special cases, that is probably not worth considering
> anyway).

No, all values which in their representation have the same bytes repeated
through the whole size, e.g. double 0x1.010101010101p-766, 32.502 and many
others.  Note, it doesn't have to be all floating point etc. if aliasing let's
it through for whatever reason.  I guess if we have write-after-write
dependency, whether in the  tree-loop-distribution or perhaps the vectorizer
too, look if all those writes don't have constant sources that have
representation of one repeated byte all over and whether both stores in the
dependency write the same bytes, ignore the write-after-write dependency.

[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?

2020-01-21 Thread nathanael.schaeffer at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334

--- Comment #3 from N Schaeffer  ---
Hi,

Thanks for pointing out the issue about writing different values. This makes
sense.
However, since memset deals with bytes, whenever the type of array is floating
point data (or anything longer than bytes), it will not be possible to use
memset to set different values.
Indeed, the code snippet you propose is not compiled with memset for 1.0.

So I think only zeros and NaNs are possible to optimize to memset anyway (and
some other very special cases, that is probably not worth considering anyway).

[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?

2020-01-21 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334

--- Comment #2 from bin cheng  ---
(In reply to Richard Biener from comment #1)
> Confirmed.  The issue is that the overlap would be an issue if the stores
> were using different values like
> 
> void test_simple_code(long l, double* mem, long ofs2) {
> for (long k=0; k   mem[k] = 0.0;
>   mem[ofs2 +k] = 1.0;
> }
> }
> 
> and we're simply not optimizing the case where the write-after-write
> dependence can be ignored because the stored value is always the same.
> I'm also not sure whether that's easy to do ... Bin?

I will check if it can be handled as a special case.  Thanks.

[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?

2020-01-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-21
 CC||amker at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  The issue is that the overlap would be an issue if the stores
were using different values like

void test_simple_code(long l, double* mem, long ofs2) {
for (long k=0; k