[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2021-08-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=47989
   Keywords||documentation

--- Comment #7 from Andrew Pinski  ---
See PR 47989 for the reason why this option is not enabled for scalar code and
why it was only enabled for vectorized code.

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2013-01-08 Thread vincenzo.innocente at cern dot ch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #5 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2013-01-08 15:29:18 UTC ---

we just got hit by this great type of code (copysign is unknown to

scientists)



most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x

is optimized in 1)





cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s

#includecmath

int one(float x) {

  return x/std::abs(x);

}



.text

.align 4,0x90

.globl __Z3onef

__Z3onef:

LFB86:

movssLC0(%rip), %xmm2

andps%xmm0, %xmm2

rcpss%xmm2, %xmm1

mulss%xmm1, %xmm2

mulss%xmm1, %xmm2

addss%xmm1, %xmm1

subss%xmm2, %xmm1

mulss%xmm0, %xmm1

cvttss2si%xmm1, %eax

ret


[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2013-01-08 Thread glisse at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



Marc Glisse glisse at gcc dot gnu.org changed:



   What|Removed |Added



 CC||glisse at gcc dot gnu.org



--- Comment #6 from Marc Glisse glisse at gcc dot gnu.org 2013-01-08 23:55:18 
UTC ---

(In reply to comment #5)

 we just got hit by this great type of code (copysign is unknown to

 scientists)

 

 most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); 
 (x/x

 is optimized in 1)

 

 

 cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s

 #includecmath

 int one(float x) {

   return x/std::abs(x);

 }



That looks like a completely different issue than this PR, I think you should

open a different PR if you don't want it to get lost. It seems easy to add a

few lines to fold_binary_loc about it (not the best place, but that's where the

others are) near the place that optimizes A / A to 1.0. You could try writing

the patch, I don't foresee any trap.


[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org 2012-12-20 
15:52:31 UTC ---

Use -mrecip.  It's otherwise not safe for SPEC CPU 2006 which is why it is not

enabled by default for -ffast-math.


[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread vincenzo.innocente at cern dot ch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #2 from vincenzo Innocente vincenzo.innocente at cern dot ch 
2012-12-20 15:55:03 UTC ---

Thanks.

not safe meaning producing incorrect results?

Is it documented?


[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #3 from Richard Biener rguenth at gcc dot gnu.org 2012-12-20 
15:58:55 UTC ---

(In reply to comment #2)

 Thanks.

 not safe meaning producing incorrect results?



Yes.



 Is it documented?



See the documentation for -mrecip:



...



Note that while the throughput of the sequence is higher than the throughput

of the non-reciprocal instruction, the precision of the sequence can be

decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).



...


[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread dominiq at lps dot ens.fr


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #4 from Dominique d'Humieres dominiq at lps dot ens.fr 2012-12-20 
16:07:11 UTC ---

 is there any reason why rsqrtss and rcpss are not used for scalar code while

 rsqrtps and rcpps are used for loops?



Yep! I don't have the patience to dig the bugzilla archive right now, but the

main reason is related to a loss of accuracy (especially 1/2.0 != 0.5) leading

to problems in some codes (see gas_dyn.f90 in the polyhedron tests). You can

pass options to force the use of rsqrtss and rcpss for scalars:



-mrecip

This option enables use of RCPSS and RSQRTSS instructions (and their vectorized

variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase

precision instead of DIVSS and SQRTSS (and their vectorized variants) for

single-precision floating-point arguments. These instructions are generated

only when -funsafe-math-optimizations is enabled together with

-finite-math-only and -fno-trapping-math. Note that while the throughput of the

sequence is higher than the throughput of the non-reciprocal instruction, the

precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of

1.0 equals 0.9994).

Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already

with -ffast-math (or the above option combination), and doesn't need -mrecip.



Also note that GCC emits the above sequence with additional Newton-Raphson step

for vectorized single-float division and vectorized sqrtf(x) already with

-ffast-math (or the above option combination), and doesn't need -mrecip. 



-mrecip=opt

This option controls which reciprocal estimate instructions may be used. opt is

a comma-separated list of options, which may be preceded by a `!' to invert the

option:

`all'

Enable all estimate instructions. 

`default'

Enable the default instructions, equivalent to -mrecip. 

`none'

Disable all estimate instructions, equivalent to -mno-recip. 

`div'

Enable the approximation for scalar division. 

`vec-div'

Enable the approximation for vectorized division. 

`sqrt'

Enable the approximation for scalar square root. 

`vec-sqrt'

Enable the approximation for vectorized square root.

So, for example, -mrecip=all,!sqrt enables all of the reciprocal

approximations, except for square root.