[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334 --- Comment #5 from N Schaeffer --- Elaborating a bit on this: I can eliminate this problem by using: -O3 -fno-tree-loop-distribute-patterns -fno-tree-loop-vectorize I wonder why -fno-tree-loop-distribute-patterns is not enough ? In that case, I get no calls to memset, but still the write-after-write dependency check. Also, decorating the loop with #pragma omp simd AND compiling with -O3 -march=core-avx2 -fopenmp-simd -fno-tree-loop-distribute-patterns finally generates sensible code. Note that with -fno-tree-loop-distribute-patterns, I still get calls to memset instead of a simd-vectorized loop...
[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- (In reply to N Schaeffer from comment #3) > So I think only zeros and NaNs are possible to optimize to memset anyway > (and some other very special cases, that is probably not worth considering > anyway). No, all values which in their representation have the same bytes repeated through the whole size, e.g. double 0x1.010101010101p-766, 32.502 and many others. Note, it doesn't have to be all floating point etc. if aliasing let's it through for whatever reason. I guess if we have write-after-write dependency, whether in the tree-loop-distribution or perhaps the vectorizer too, look if all those writes don't have constant sources that have representation of one repeated byte all over and whether both stores in the dependency write the same bytes, ignore the write-after-write dependency.
[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334 --- Comment #3 from N Schaeffer --- Hi, Thanks for pointing out the issue about writing different values. This makes sense. However, since memset deals with bytes, whenever the type of array is floating point data (or anything longer than bytes), it will not be possible to use memset to set different values. Indeed, the code snippet you propose is not compiled with memset for 1.0. So I think only zeros and NaNs are possible to optimize to memset anyway (and some other very special cases, that is probably not worth considering anyway).
[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334 --- Comment #2 from bin cheng --- (In reply to Richard Biener from comment #1) > Confirmed. The issue is that the overlap would be an issue if the stores > were using different values like > > void test_simple_code(long l, double* mem, long ofs2) { > for (long k=0; k mem[k] = 0.0; > mem[ofs2 +k] = 1.0; > } > } > > and we're simply not optimizing the case where the write-after-write > dependence can be ignored because the stored value is always the same. > I'm also not sure whether that's easy to do ... Bin? I will check if it can be handled as a special case. Thanks.
[Bug tree-optimization/93334] -O3 generates useless code checking for overlapping memset ?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-01-21 CC||amker at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. The issue is that the overlap would be an issue if the stores were using different values like void test_simple_code(long l, double* mem, long ofs2) { for (long k=0; k