https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88533

            Bug ID: 88533
           Summary: [9 Regression] Higher performance penalty of
                    array-bounds checking for sparse-matrix vector
                    multiply
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: anlauf at gmx dot de
  Target Milestone: ---

Created attachment 45249
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45249&action=edit
Fortran code

I am seeing an increased performance penalty due to array-bounds checking,
in particular for sparse-matrix (CSC) vector multiplication.

The attached, semi-reduced test case, which only needs the provided meta-data
but otherwise uses random elements, should be sufficient for demonstration.

I have tested on an i5-8250U and tuned the "outer loop" so that the testcase
runs in 1-2 seconds on that machine.  For that purpose, I have used some
feedback provided to my initial posting on gcc-help, see
https://gcc.gnu.org/ml/gcc-help/2018-12/msg00041.html

Tested compilers:

gcc-7.3.1 20180323 [gcc-7-branch revision 258812]
gcc-8.2.1 20181202
gcc-9.0.0 20181214

baseline options: -O2 -ftree-vectorize -g -march=skylake -mfpmath=sse

7: 1.12
8: 1.12
9: 1.12

baseline + -funroll-loops :

7: 1.00
8: 1.00
9: 0.99

baseline + -funroll-loops -fcheck=bounds :

7: 1.56
8: 1.56
9: 1.93

baseline + -funroll-loops -fcheck=bounds -fno-tree-ch :

7: 1.78
8: 1.80
9: 1.83


baseline + -funroll-loops -fno-tree-ch :

7: 1.05
8: 1.09
9: 1.09

Preliminary conclusions:

- -funroll-loops is helpful here
- -fcheck=bounds is quite expensive with current 9.0
- -fno-tree-ch brings the different versions in line,
   it benefits 9, but is worse for 7 and 8
- there a no options above that bring 9 to the level of 7 and 8
  as long as bounds-checking is desired.

Reply via email to