[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2021-09-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #7 from Andrew Pinski  ---
I Noticed clang/LLVM does not do this either nor ICC.

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #6 from Allan Jensen  ---
Yeah, the a==255 was actually not a case I would expect the compiler to solve,
which is why I changed the example to the a==0 case, which should be solveable
using existing constant propagation.

Note you can put both short-cuts in, though as it standards only gcc 7 and 8
can vectorize it with two conditions, so we cant use that in general code as we
need it to be fast elsewhere too.

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|WAITING |NEW
 Blocks||53947

--- Comment #5 from Richard Biener  ---
I think it's too much asked from GCC to prove that if a == 255 the computation
in the else path would result in x.  The simplest fix would be to remove the
conditional in the source.

It's true that GCC doesn't evaluate costs of the vectorization properly
as it looks at the if-converted copy when calculating the cost of the
scalar loop.  OTOH any estimate on how often the shortcut triggers
compared to the computation might be off and a conservative estimate may
cause us to not vectorize and thus slow down the code.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #4 from Allan Jensen  ---
Created attachment 43995
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43995=edit
gccbug85406.cpp

This version compiles with a pcmpeqd and pandn instead of a blend, but the
principle is the same.

Though the last of a ptest in the beginning is worse, as that risks a
performance regression compared to non-vectorized.

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-20 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #3 from Allan Jensen  ---
You need to add the loop around it

void test(unsigned *buffer, int count)
{
for (int i = 0; i < count; ++i)
buffer[i] = qPremultiply(buffer[i]);
}

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2018-04-18
 CC||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
I don't see any vectorization being done on the testcase.  Please specify the
GCC version you tested as well as the command-line flags and eventually
complete the testcase (there's no loop here?).

[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations

2018-04-15 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406

--- Comment #1 from Allan Jensen  ---
Note it might be hard to figure out for the compiler that the result for a==255
will leave the input unchanged, but you can observe the same if you instead
test for a == 0 (and return 0). In that case the compiler should have enough
math deduction to be able to tell that the result of a==0 is always 0.