[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

--- Comment #5 from vincenzo Innocente  ---
it is a regression 
gcc version 4.9.3 (GCC) 
c++ -Ofast *.c; ./a.out
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
gcc -Ofast *.c; ./a.out
c++ -v
Composite Score: 2449.06
FFT Mflops:  2046.03(N=1024)
SOR Mflops:  1654.04(100 x 100)
MonteCarlo: Mflops:   813.44
Sparse matmult  Mflops:  2962.08(N=1000, nz=5000)
LU  Mflops:  4769.72(M=100, N=100)
---
gcc -Ofast *.c -lm; ./a.out
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 2475.22
FFT Mflops:  2064.19(N=1024)
SOR Mflops:  1633.01(100 x 100)
MonteCarlo: Mflops:   810.37
Sparse matmult  Mflops:  2970.47(N=1000, nz=5000)
LU  Mflops:  4898.06(M=100, N=100)

[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

Richard Biener  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
(In reply to vincenzo Innocente from comment #3)
> > Any reason you are using the c++ driver here?
> Because I am interested in C++ performance
> never imagined that the c++ front-end could make a difference on such a
> code...
> From my point of view it is even a more severe regression than just "lto"

Yeah, didn't try to figure out whether the C vs. C++ thing is a 
regression.  But I suspect the change to the C++ loop lowering.

Certainly needs closer investigation.

[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

--- Comment #3 from vincenzo Innocente  ---
> Any reason you are using the c++ driver here?
Because I am interested in C++ performance
never imagined that the c++ front-end could make a difference on such a code...
>From my point of view it is even a more severe regression than just "lto"

[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

--- Comment #2 from Richard Biener  ---
It looks like we get different BB order out of C++ than C but otherwise no real
code-differences as far as I can see.

[Bug c++/69564] lto and/or C++ make scimark2 LU slower

2016-02-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564

Richard Biener  changed:

   What|Removed |Added

   Keywords||lto, missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-01
 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
  Component|lto |c++
Summary|lto makes scimark2 LU   |lto and/or C++ make
   |slower  |scimark2 LU slower
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Any reason you are using the c++ driver here?

I get

> gcc-6 -Ofast -flto *.c -lm -B /abuild/rguenther/trunk3-g/gcc
> ./a.out 
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 1729.02
FFT Mflops:  1247.04(N=1024)
SOR Mflops:  1537.70(100 x 100)
MonteCarlo: Mflops:   842.21
Sparse matmult  Mflops:  1657.86(N=1000, nz=5000)
LU  Mflops:  3360.29(M=100, N=100)

> gcc-6 -Ofast  *.c -lm -B /abuild/rguenther/trunk3-g/gcc
> ./a.out 
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 1645.94
FFT Mflops:  1288.61(N=1024)
SOR Mflops:  1471.29(100 x 100)
MonteCarlo: Mflops:   459.90
Sparse matmult  Mflops:  1665.91(N=1000, nz=5000)
LU  Mflops:  3343.98(M=100, N=100)


Ok, when using g++ to compile things I _do_ get

> g++-6 -Ofast -flto *.c -lm -B /abuild/rguenther/trunk3-g/gcc
> ./a.out 
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 1321.43
FFT Mflops:  1261.86(N=1024)
SOR Mflops:  1533.77(100 x 100)
MonteCarlo: Mflops:   850.69
Sparse matmult  Mflops:  1669.90(N=1000, nz=5000)
LU  Mflops:  1290.93(M=100, N=100)

> g++-6 -Ofast  *.c -lm -B /abuild/rguenther/trunk3-g/gcc
> ./a.out 
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 1492.12
FFT Mflops:  1279.12(N=1024)
SOR Mflops:  1479.86(100 x 100)
MonteCarlo: Mflops:   433.83
Sparse matmult  Mflops:  1637.11(N=1000, nz=5000)
LU  Mflops:  2630.71(M=100, N=100)

So even without LTO I get a hit in using C++ to compile LU.  Interesting.