https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80232

            Bug ID: 80232
           Summary: Ofast pessimizes Sparse matmult in scimark2 benchmark
                    on avx platforms
           Product: gcc
           Version: 7.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

on my machine
after the usual
mkdir scimark2TMP
cd scimark2TMP
wget http://math.nist.gov/scimark2/scimark2_1c.zip .
unzip scimark2_1c.zip
gcc -v

I get 
Using built-in specs.
COLLECT_GCC=c++
COLLECT_LTO_WRAPPER=/afs/cern.ch/work/i/innocent/public/w5/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk//configure
--prefix=/afs/cern.ch/user/i/innocent/w5 -enable-languages=c,c++,lto,fortran
--enable-lto -enable-libitm -disable-multilib
Thread model: posix
gcc version 7.0.1 20170326 (experimental) [trunk revision 246485] (GCC) 

[innocent@vinavx3 scimark2TMP]$ gcc -O2 -march=haswell *.c -lm
[innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  3271.69    (N=1000, nz=5000)
[innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  2946.76    (N=100000, nz=1000000)
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=nehalem *.c -lm
[innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  3281.93    (N=1000, nz=5000)
[innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  2859.34    (N=100000, nz=1000000)
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=corei7-avx *.c -lm
[innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  2987.40    (N=1000, nz=5000)
[innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  2869.35    (N=100000, nz=1000000)
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=haswell *.c -lm
[innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  2579.52    (N=1000, nz=5000)
[innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:  2381.40    (N=100000, nz=1000000)

so O2 and sse4.2 are the fastest, avx is already slower, avx2 is dramatically
slower
par of the difference can be due to gather operation as in #57796: not sure the
difference w/r/t O2


interesting to note that on KNL it makes almost not difference (not sure if
this is positive or negative...) with a hint of speedup for the large
problem...

[innocent@vinknl0 scimark2TMP]$ gcc -Ofast -march=knl *.c -lm
[innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   348.13    (N=1000, nz=5000)
[innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   358.67    (N=100000, nz=1000000)
[innocent@vinknl0 scimark2TMP]$ gcc -O2 -march=knl *.c -lm
[innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   329.33    (N=1000, nz=5000)
[innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   321.51    (N=100000, nz=1000000)
[innocent@vinknl0 scimark2TMP]$  gcc -Ofast -march=corei7-avx *.c -lm
[innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   343.12    (N=1000, nz=5000)
[innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   323.03    (N=100000, nz=1000000)
[innocent@vinknl0 scimark2TMP]$ gcc -Ofast -march=nehalem *.c -lm
 ./a.out 5 | grep "Sparse matmult"
[innocent@vinknl0 scimark2TMP]$  ./a.out 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   343.57    (N=1000, nz=5000)
[innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult"
Sparse matmult  Mflops:   321.00    (N=100000, nz=1000000)

Reply via email to