[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #7 from Andrew Pinski --- I Noticed clang/LLVM does not do this either nor ICC.
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #6 from Allan Jensen --- Yeah, the a==255 was actually not a case I would expect the compiler to solve, which is why I changed the example to the a==0 case, which should be solveable using existing constant propagation. Note you can put both short-cuts in, though as it standards only gcc 7 and 8 can vectorize it with two conditions, so we cant use that in general code as we need it to be fast elsewhere too.
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|WAITING |NEW Blocks||53947 --- Comment #5 from Richard Biener --- I think it's too much asked from GCC to prove that if a == 255 the computation in the else path would result in x. The simplest fix would be to remove the conditional in the source. It's true that GCC doesn't evaluate costs of the vectorization properly as it looks at the if-converted copy when calculating the cost of the scalar loop. OTOH any estimate on how often the shortcut triggers compared to the computation might be off and a conservative estimate may cause us to not vectorize and thus slow down the code. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #4 from Allan Jensen --- Created attachment 43995 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43995=edit gccbug85406.cpp This version compiles with a pcmpeqd and pandn instead of a blend, but the principle is the same. Though the last of a ptest in the beginning is worse, as that risks a performance regression compared to non-vectorized.
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #3 from Allan Jensen --- You need to add the loop around it void test(unsigned *buffer, int count) { for (int i = 0; i < count; ++i) buffer[i] = qPremultiply(buffer[i]); }
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2018-04-18 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- I don't see any vectorization being done on the testcase. Please specify the GCC version you tested as well as the command-line flags and eventually complete the testcase (there's no loop here?).
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 --- Comment #1 from Allan Jensen --- Note it might be hard to figure out for the compiler that the result for a==255 will leave the input unchanged, but you can observe the same if you instead test for a == 0 (and return 0). In that case the compiler should have enough math deduction to be able to tell that the result of a==0 is always 0.