[Bug c/89774] Add flag to force single precision

2019-04-22 Thread JunMa at linux dot alibaba.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

JunMa  changed:

   What|Removed |Added

 CC||JunMa at linux dot alibaba.com

--- Comment #10 from JunMa  ---
(In reply to Segher Boessenkool from comment #9)
> We currently only do it for trivial cases, as the example in comment 6 shows
> as well.  This is done during expand, which is the wrong place for it.
> 
> PR90070 is asking for better optimisation of this: do the operation in single
> precision, and use single-precision constants, if this does not change the
> result (or there is some -ffast-math option).
> 
> PR22326 is also closely related.  I don't think we can close any of these PRs
> as a dup of another, they are all asking for slightly different things :-)

clang can do this optimization in instcombine pass. see this case:

  float f4( float x ) {double t = x + 2.0; return  t; }
  float f5( float x ) {return  x + 2.0;  }

compiled with -O2 -march=native, GCC gives:

f4:
vcvtss2sd%xmm0, %xmm0, %xmm0
vaddsd .LC1(%rip), %xmm0, %xmm0
vcvtsd2ss%xmm0, %xmm0, %xmm0
ret

f5:
vaddss .LC3(%rip), %xmm0, %xmm0
ret

while clang always emits vaddss instruction.

[Bug c/89774] Add flag to force single precision

2019-04-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Segher Boessenkool  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2019-04-22
 CC||segher at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=90070,
   ||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=22326
 Resolution|INVALID |---
 Ever confirmed|0   |1

--- Comment #9 from Segher Boessenkool  ---
We currently only do it for trivial cases, as the example in comment 6 shows
as well.  This is done during expand, which is the wrong place for it.

PR90070 is asking for better optimisation of this: do the operation in single
precision, and use single-precision constants, if this does not change the
result (or there is some -ffast-math option).

PR22326 is also closely related.  I don't think we can close any of these PRs
as a dup of another, they are all asking for slightly different things :-)

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

--- Comment #8 from Andrew Pinski  ---
(In reply to Marius Messerschmidt from comment #7)
> Looks good, which options did you use?

-O2 -march=native (the last part was done as I wanted to get the fused
multiple-add but it is not needed).

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread marius.messerschmidt at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

--- Comment #7 from Marius Messerschmidt  ---
Looks good, which options did you use?

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Andrew Pinski  ---
Take your example and convert it into different functions:
float a(void)
{
  return 2.0;
}

float b(float a)
{
  return 2.0 * a;
}

double d(void)
{
  return 3.0;
}

double e(double d)
{
  return 3.0 * d;
}

double z(float a, double d)
{
  return 2.0 * a + 3.0 * d;
}

 CUT 
a:
.LFB0:
.cfi_startproc
vmovss  .LC0(%rip), %xmm0 ;;; load single
ret
b:
.LFB1:
.cfi_startproc
vaddss  %xmm0, %xmm0, %xmm0 ;;; add single (same as a*2.0)
ret
d:
.LFB2:
.cfi_startproc
vmovsd  .LC1(%rip), %xmm0 ;;; load double
ret
e:
.LFB3:
.cfi_startproc
vmulsd  .LC1(%rip), %xmm0, %xmm0 ;;; multiple double
ret
z:
.LFB4:
.cfi_startproc
vmulsd  .LC1(%rip), %xmm1, %xmm1  ;; multiple double
vcvtss2sd   %xmm0, %xmm0, %xmm0 ;; convert single to double
vfmadd132sd .LC2(%rip), %xmm1, %xmm0 ;; multiple add double
ret

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread marius.messerschmidt at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

--- Comment #5 from Marius Messerschmidt  ---
I did checkt the output without --fsingle-precision-constant

Is this only enabled via -fsingle-precision-constant or at any optimization
level?

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #4 from Andrew Pinski  ---
(In reply to Marius Messerschmidt from comment #2)
> Is something like that already implemented or if not, do you think that this
> is useful and could be implemented?

This optimization is mostly done already.  Did you check before posting that
comment?

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread marius.messerschmidt at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Marius Messerschmidt  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|FIXED   |---

--- Comment #3 from Marius Messerschmidt  ---
Reopening issue

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread marius.messerschmidt at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Marius Messerschmidt  changed:

   What|Removed |Added

 Resolution|WORKSFORME  |FIXED

--- Comment #2 from Marius Messerschmidt  ---
This will cause issues the other way around as well.

I think that I did not state clearly what I meant...

Right now you can only use 'all double' or 'all float'. What I am looking for
is some kind of performance-oriented solution that will pick the 'best' option
for each literal.

Quick example:

void f() {
  float a = 2.0; // 2.0 -> single
  float b = 2.0 * b; // 2.0 -> single

  double d = 3.0;// 3.0 -> double
  double e = 3.0 * d;// 3.0 -> double

  double z = 2.0 * a + 3.0 * d; // both 2.0 and 3.0 -> double (only cast a)
}

The basic idea is to increase performance by reducing casting instructions.

Is something like that already implemented or if not, do you think that this is
useful and could be implemented?

[Bug c/89774] Add flag to force single precision

2019-03-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #1 from Richard Biener  ---
You can use -fsingle-precision-constant