https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80232
Bug ID: 80232 Summary: Ofast pessimizes Sparse matmult in scimark2 benchmark on avx platforms Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincenzo.innocente at cern dot ch Target Milestone: --- on my machine after the usual mkdir scimark2TMP cd scimark2TMP wget http://math.nist.gov/scimark2/scimark2_1c.zip . unzip scimark2_1c.zip gcc -v I get Using built-in specs. COLLECT_GCC=c++ COLLECT_LTO_WRAPPER=/afs/cern.ch/work/i/innocent/public/w5/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk//configure --prefix=/afs/cern.ch/user/i/innocent/w5 -enable-languages=c,c++,lto,fortran --enable-lto -enable-libitm -disable-multilib Thread model: posix gcc version 7.0.1 20170326 (experimental) [trunk revision 246485] (GCC) [innocent@vinavx3 scimark2TMP]$ gcc -O2 -march=haswell *.c -lm [innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 3271.69 (N=1000, nz=5000) [innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 2946.76 (N=100000, nz=1000000) [innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=nehalem *.c -lm [innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 3281.93 (N=1000, nz=5000) [innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 2859.34 (N=100000, nz=1000000) [innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=corei7-avx *.c -lm [innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 2987.40 (N=1000, nz=5000) [innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 2869.35 (N=100000, nz=1000000) [innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=haswell *.c -lm [innocent@vinavx3 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 2579.52 (N=1000, nz=5000) [innocent@vinavx3 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 2381.40 (N=100000, nz=1000000) so O2 and sse4.2 are the fastest, avx is already slower, avx2 is dramatically slower par of the difference can be due to gather operation as in #57796: not sure the difference w/r/t O2 interesting to note that on KNL it makes almost not difference (not sure if this is positive or negative...) with a hint of speedup for the large problem... [innocent@vinknl0 scimark2TMP]$ gcc -Ofast -march=knl *.c -lm [innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 348.13 (N=1000, nz=5000) [innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 358.67 (N=100000, nz=1000000) [innocent@vinknl0 scimark2TMP]$ gcc -O2 -march=knl *.c -lm [innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 329.33 (N=1000, nz=5000) [innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 321.51 (N=100000, nz=1000000) [innocent@vinknl0 scimark2TMP]$ gcc -Ofast -march=corei7-avx *.c -lm [innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 343.12 (N=1000, nz=5000) [innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 323.03 (N=100000, nz=1000000) [innocent@vinknl0 scimark2TMP]$ gcc -Ofast -march=nehalem *.c -lm ./a.out 5 | grep "Sparse matmult" [innocent@vinknl0 scimark2TMP]$ ./a.out 5 | grep "Sparse matmult" Sparse matmult Mflops: 343.57 (N=1000, nz=5000) [innocent@vinknl0 scimark2TMP]$ ./a.out -large 5 | grep "Sparse matmult" Sparse matmult Mflops: 321.00 (N=100000, nz=1000000)