https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88533
Bug ID: 88533 Summary: [9 Regression] Higher performance penalty of array-bounds checking for sparse-matrix vector multiply Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: anlauf at gmx dot de Target Milestone: --- Created attachment 45249 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45249&action=edit Fortran code I am seeing an increased performance penalty due to array-bounds checking, in particular for sparse-matrix (CSC) vector multiplication. The attached, semi-reduced test case, which only needs the provided meta-data but otherwise uses random elements, should be sufficient for demonstration. I have tested on an i5-8250U and tuned the "outer loop" so that the testcase runs in 1-2 seconds on that machine. For that purpose, I have used some feedback provided to my initial posting on gcc-help, see https://gcc.gnu.org/ml/gcc-help/2018-12/msg00041.html Tested compilers: gcc-7.3.1 20180323 [gcc-7-branch revision 258812] gcc-8.2.1 20181202 gcc-9.0.0 20181214 baseline options: -O2 -ftree-vectorize -g -march=skylake -mfpmath=sse 7: 1.12 8: 1.12 9: 1.12 baseline + -funroll-loops : 7: 1.00 8: 1.00 9: 0.99 baseline + -funroll-loops -fcheck=bounds : 7: 1.56 8: 1.56 9: 1.93 baseline + -funroll-loops -fcheck=bounds -fno-tree-ch : 7: 1.78 8: 1.80 9: 1.83 baseline + -funroll-loops -fno-tree-ch : 7: 1.05 8: 1.09 9: 1.09 Preliminary conclusions: - -funroll-loops is helpful here - -fcheck=bounds is quite expensive with current 9.0 - -fno-tree-ch brings the different versions in line, it benefits 9, but is worse for 7 and 8 - there a no options above that bring 9 to the level of 7 and 8 as long as bounds-checking is desired.