[Bug target/63599] "wrong" branch optimization with Ofast in a loop

2024-03-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||5.1.0, 6.1.0
   Keywords|wrong-code  |missed-optimization
 Status|UNCONFIRMED |RESOLVED
  Known to work||7.1.0
 Resolution|--- |FIXED
   Target Milestone|--- |7.0

--- Comment #5 from Andrew Pinski  ---
Fixed for GCC 7.

[Bug target/63599] wrong branch optimization with Ofast in a loop

2014-10-20 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599

--- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org ---
The tree level looks like this:
  t_13 = VEC_COND_EXPR t_4 = { 4.142135679721832275390625e-1,
4.142135679721832275390625e-1, 4.142135679721832275390625e-1,
4.142135679721832275390625e-1 }, t_4, _12;
  ret_14 = VEC_COND_EXPR t_4  { 4.142135679721832275390625e-1,
4.142135679721832275390625e-1, 4.142135679721832275390625e-1,
4.142135679721832275390625e-1 }, { 7.85398185253143310546875e-1,
7.85398185253143310546875e-1, 7.85398185253143310546875e-1,
7.85398185253143310546875e-1 }, { 0.0, 0.0, 0.0, 0.0 };
  t_16 = _9 != 0 ? t_13 : t_4;
  ret_15 = _9 != 0 ? ret_14 : { 0.0, 0.0, 0.0, 0.0 };


movmskps  %xmm8, %edx
 does not protect the code in the if block...
Yes it does just not the way you think it does.

Notice the last two statements are conditional expressions.

And that gets translated into the following:
testl%edx, %edx
jne.L9
movaps%xmm3, %xmm1
pxor%xmm2, %xmm2
.L9:

So if anything it is a missed optimization dealing with conditional moves with
vectors without a vector comparison.

[Bug target/63599] wrong branch optimization with Ofast in a loop

2014-10-20 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599

--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch ---
I agree that the code produces correct results. It looks to me  sub-optimal.
I understand that with Ofast the sequence below will be always executed

andps%xmm5, %xmm8
rcpps%xmm3, %xmm0
mulps%xmm0, %xmm3
mulps%xmm0, %xmm3
addps%xmm0, %xmm0
subps%xmm3, %xmm0
mulps%xmm0, %xmm1
movaps%xmm2, %xmm0
cmpleps%xmm4, %xmm0
blendvps%xmm0, %xmm2, %xmm1

while with O2 it will not.
and this generates a performance penalty for samples where the test is often
false.
( I tried to add __builtin_expect(x, false) with no effect. )


[Bug target/63599] wrong branch optimization with Ofast in a loop

2014-10-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599

--- Comment #3 from Marc Glisse glisse at gcc dot gnu.org ---
ifcvt making a transformation that doesn't help vectorization and ends up
pessimizing the code... not really the first time this happens. I believe Jakub
had a big patch for that, but it never got in. Maybe vectors could be
special-cased if we never vectorize them anyway.


[Bug target/63599] wrong branch optimization with Ofast in a loop

2014-10-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63599

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org ---
The big patch got committed in, but generally turning off tree if-conversion
didn't turn to be a win, so what ended up being committed is only if there are
any masked loads/stores, if-conversion applies only to vectorized loop and
nothing else.