[Bug target/63599] "wrong" branch optimization with Ofast in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599 Andrew Pinski changed: What|Removed |Added Known to fail||5.1.0, 6.1.0 Keywords|wrong-code |missed-optimization Status|UNCONFIRMED |RESOLVED Known to work||7.1.0 Resolution|--- |FIXED Target Milestone|--- |7.0 --- Comment #5 from Andrew Pinski --- Fixed for GCC 7.
[Bug target/63599] wrong branch optimization with Ofast in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599 --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org --- The tree level looks like this: t_13 = VEC_COND_EXPR t_4 = { 4.142135679721832275390625e-1, 4.142135679721832275390625e-1, 4.142135679721832275390625e-1, 4.142135679721832275390625e-1 }, t_4, _12; ret_14 = VEC_COND_EXPR t_4 { 4.142135679721832275390625e-1, 4.142135679721832275390625e-1, 4.142135679721832275390625e-1, 4.142135679721832275390625e-1 }, { 7.85398185253143310546875e-1, 7.85398185253143310546875e-1, 7.85398185253143310546875e-1, 7.85398185253143310546875e-1 }, { 0.0, 0.0, 0.0, 0.0 }; t_16 = _9 != 0 ? t_13 : t_4; ret_15 = _9 != 0 ? ret_14 : { 0.0, 0.0, 0.0, 0.0 }; movmskps %xmm8, %edx does not protect the code in the if block... Yes it does just not the way you think it does. Notice the last two statements are conditional expressions. And that gets translated into the following: testl%edx, %edx jne.L9 movaps%xmm3, %xmm1 pxor%xmm2, %xmm2 .L9: So if anything it is a missed optimization dealing with conditional moves with vectors without a vector comparison.
[Bug target/63599] wrong branch optimization with Ofast in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599 --- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch --- I agree that the code produces correct results. It looks to me sub-optimal. I understand that with Ofast the sequence below will be always executed andps%xmm5, %xmm8 rcpps%xmm3, %xmm0 mulps%xmm0, %xmm3 mulps%xmm0, %xmm3 addps%xmm0, %xmm0 subps%xmm3, %xmm0 mulps%xmm0, %xmm1 movaps%xmm2, %xmm0 cmpleps%xmm4, %xmm0 blendvps%xmm0, %xmm2, %xmm1 while with O2 it will not. and this generates a performance penalty for samples where the test is often false. ( I tried to add __builtin_expect(x, false) with no effect. )
[Bug target/63599] wrong branch optimization with Ofast in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599 --- Comment #3 from Marc Glisse glisse at gcc dot gnu.org --- ifcvt making a transformation that doesn't help vectorization and ends up pessimizing the code... not really the first time this happens. I believe Jakub had a big patch for that, but it never got in. Maybe vectors could be special-cased if we never vectorize them anyway.
[Bug target/63599] wrong branch optimization with Ofast in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- The big patch got committed in, but generally turning off tree if-conversion didn't turn to be a win, so what ended up being committed is only if there are any masked loads/stores, if-conversion applies only to vectorized loop and nothing else.