[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #10 from JunMa --- (In reply to Segher Boessenkool from comment #9) > We currently only do it for trivial cases, as the example in comment 6 shows > as well. This is done during expand, which is the wrong place for it. > > PR90070 is asking for better optimisation of this: do the operation in single > precision, and use single-precision constants, if this does not change the > result (or there is some -ffast-math option). > > PR22326 is also closely related. I don't think we can close any of these PRs > as a dup of another, they are all asking for slightly different things :-) clang can do this optimization in instcombine pass. see this case: float f4( float x ) {double t = x + 2.0; return t; } float f5( float x ) {return x + 2.0; } compiled with -O2 -march=native, GCC gives: f4: vcvtss2sd%xmm0, %xmm0, %xmm0 vaddsd .LC1(%rip), %xmm0, %xmm0 vcvtsd2ss%xmm0, %xmm0, %xmm0 ret f5: vaddss .LC3(%rip), %xmm0, %xmm0 ret while clang always emits vaddss instruction.
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 Segher Boessenkool changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2019-04-22 CC||segher at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=90070, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=22326 Resolution|INVALID |--- Ever confirmed|0 |1 --- Comment #9 from Segher Boessenkool --- We currently only do it for trivial cases, as the example in comment 6 shows as well. This is done during expand, which is the wrong place for it. PR90070 is asking for better optimisation of this: do the operation in single precision, and use single-precision constants, if this does not change the result (or there is some -ffast-math option). PR22326 is also closely related. I don't think we can close any of these PRs as a dup of another, they are all asking for slightly different things :-)
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 --- Comment #8 from Andrew Pinski --- (In reply to Marius Messerschmidt from comment #7) > Looks good, which options did you use? -O2 -march=native (the last part was done as I wanted to get the fused multiple-add but it is not needed).
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 --- Comment #7 from Marius Messerschmidt --- Looks good, which options did you use?
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #6 from Andrew Pinski --- Take your example and convert it into different functions: float a(void) { return 2.0; } float b(float a) { return 2.0 * a; } double d(void) { return 3.0; } double e(double d) { return 3.0 * d; } double z(float a, double d) { return 2.0 * a + 3.0 * d; } CUT a: .LFB0: .cfi_startproc vmovss .LC0(%rip), %xmm0 ;;; load single ret b: .LFB1: .cfi_startproc vaddss %xmm0, %xmm0, %xmm0 ;;; add single (same as a*2.0) ret d: .LFB2: .cfi_startproc vmovsd .LC1(%rip), %xmm0 ;;; load double ret e: .LFB3: .cfi_startproc vmulsd .LC1(%rip), %xmm0, %xmm0 ;;; multiple double ret z: .LFB4: .cfi_startproc vmulsd .LC1(%rip), %xmm1, %xmm1 ;; multiple double vcvtss2sd %xmm0, %xmm0, %xmm0 ;; convert single to double vfmadd132sd .LC2(%rip), %xmm1, %xmm0 ;; multiple add double ret
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 --- Comment #5 from Marius Messerschmidt --- I did checkt the output without --fsingle-precision-constant Is this only enabled via -fsingle-precision-constant or at any optimization level?
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization --- Comment #4 from Andrew Pinski --- (In reply to Marius Messerschmidt from comment #2) > Is something like that already implemented or if not, do you think that this > is useful and could be implemented? This optimization is mostly done already. Did you check before posting that comment?
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 Marius Messerschmidt changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|FIXED |--- --- Comment #3 from Marius Messerschmidt --- Reopening issue
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 Marius Messerschmidt changed: What|Removed |Added Resolution|WORKSFORME |FIXED --- Comment #2 from Marius Messerschmidt --- This will cause issues the other way around as well. I think that I did not state clearly what I meant... Right now you can only use 'all double' or 'all float'. What I am looking for is some kind of performance-oriented solution that will pick the 'best' option for each literal. Quick example: void f() { float a = 2.0; // 2.0 -> single float b = 2.0 * b; // 2.0 -> single double d = 3.0;// 3.0 -> double double e = 3.0 * d;// 3.0 -> double double z = 2.0 * a + 3.0 * d; // both 2.0 and 3.0 -> double (only cast a) } The basic idea is to increase performance by reducing casting instructions. Is something like that already implemented or if not, do you think that this is useful and could be implemented?
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #1 from Richard Biener --- You can use -fsingle-precision-constant