[Bug tree-optimization/70291] muldc3 code generation could be smarter

2018-05-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

--- Comment #6 from Wilco  ---
(In reply to ktkachov from comment #4)
> Implemented for GCC 9.

Since multiple people seem interested in the improvement, would it be useful to
backport?

[Bug tree-optimization/70291] muldc3 code generation could be smarter

2018-05-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

Richard Biener  changed:

   What|Removed |Added

 CC||smcallis at gmail dot com

--- Comment #5 from Richard Biener  ---
*** Bug 81478 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/70291] muldc3 code generation could be smarter

2018-05-03 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||9.0
 Resolution|--- |FIXED
   Target Milestone|--- |9.0

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Implemented for GCC 9.

[Bug tree-optimization/70291] muldc3 code generation could be smarter

2018-05-03 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Author: ktkachov
Date: Thu May  3 12:59:43 2018
New Revision: 259889

URL: https://gcc.gnu.org/viewcvs?rev=259889=gcc=rev
Log:
[tree-complex.c] PR tree-optimization/70291: Inline floating-point complex
multiplication more aggressively

We can improve the performance of complex floating-point multiplications by
inlining the expansion a bit more aggressively.
We can inline complex x = a * b as:
x = (ar*br - ai*bi) + i(ar*bi + br*ai);
if (isunordered (__real__ x, __imag__ x))
  x = __muldc3 (a, b); //Or __mulsc3 for single-precision

That way the common case where no NaNs are produced we can avoid the libgcc
call and fall back to the
NaN handling stuff in libgcc if either components of the expansion are NaN.

The implementation is done in expand_complex_multiplication in tree-complex.c
and the above expansion
will be done when optimising for -O1 and greater and when not optimising for
size.
At -O0 and -Os the single call to libgcc will be emitted.

For the code:
__complex double
foo (__complex double a, __complex double b)
{
  return a * b;
}

We will now emit at -O2 for aarch64:
foo:
fmuld16, d1, d3
fmuld6, d1, d2
fnmsub  d5, d0, d2, d16
fmadd   d4, d0, d3, d6
fcmpd5, d4
bvs .L8
fmovd1, d4
fmovd0, d5
ret
.L8:
stp x29, x30, [sp, -16]!
mov x29, sp
bl  __muldc3
ldp x29, x30, [sp], 16
ret

Instead of just a branch to __muldc3.

PR tree-optimization/70291
* tree-complex.c (expand_complex_libcall): Add type, inplace_p
arguments.  Change return type to tree.  Emit libcall as a new
statement rather than replacing existing one when inplace_p is true.
(expand_complex_multiplication_components): New function.
(expand_complex_multiplication): Expand floating-point complex
multiplication using the above.
(expand_complex_division): Rename inner_type parameter to type.
Update expand_complex_libcall call-site.
(expand_complex_operations_1): Update expand_complex_multiplication
and expand_complex_division call-sites.

* gcc.dg/complex-6.c: New test.
* gcc.dg/complex-7.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.dg/complex-6.c
trunk/gcc/testsuite/gcc.dg/complex-7.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-complex.c

[Bug tree-optimization/70291] muldc3 code generation could be smarter

2018-03-15 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #2 from Wilco  ---
(In reply to Ramana Radhakrishnan from comment #0)
> __complex double x;
> __complex double y;
> __complex double z;
> double a, b, c, d;
> 
> int main (void)
> {
>   x = y * z;
>   return 0; 
> }
> 
> Could well be implemented as:
> 
> 
> int main (void)
> {
>   x = y * z;
>   if (isnan (__real x) && isnan (__imag__ x))
> x = __muldc3 (y, z);
> 
>   return 0;
> }
> 
> essentially opencoding this as the standard suggests in G.5.1 => 6for C99.
> 
> spotted while looking at profiles that were reported on aarch64 with code
> compiled at O2 / O3.
> 
> I note that lowering this in this form in tree-complex.c will need a bit of
> book-keeping given that it's sort of bounded on the phi nodes and ssa_names
> before lowering begins but this could well be another math-optimization done
> later rather than munging it with the existing lowering.

Note that isnan (x) || isnan (y) is optimized to isunordered (x, y), so that
would be a faster check.

Also it may be beneficial to vectorize this despite the fallback.

[Bug tree-optimization/70291] muldc3 code generation could be smarter

2016-07-30 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
 CC||pinskia at gcc dot gnu.org
   Severity|normal  |enhancement

[Bug tree-optimization/70291] muldc3 code generation could be smarter

2016-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70291

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-18
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.