https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88405
Bug ID: 88405 Summary: Missed DSE opportunity Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- For the following code: #define MATRIX_SIZE 512 static double a[MATRIX_SIZE][MATRIX_SIZE]; static double b[MATRIX_SIZE][MATRIX_SIZE]; static double c[MATRIX_SIZE][MATRIX_SIZE]; double foo (void) { double s; int i, j, k; /* Section A */ for (i = 0; i < MATRIX_SIZE; i++) { for (j = 0; j < MATRIX_SIZE; j++) { a[i][j] = (double)i * (double)j; b[i][j] = (double)i / (double)(j+5); } } /* Section B */ for (j = 0; j < MATRIX_SIZE; j++) { for (i = 0; i < MATRIX_SIZE; i++) { s = 0; for (k = 0; k < MATRIX_SIZE; k++) { s += a[i][k] * b[k][j]; } c[i][j] = s; } } s = 0.0; // (1) #if 0 /* Section C */ for (i = 0; i < MATRIX_SIZE; i++) { for (j = 0; j < MATRIX_SIZE; j++) { s += c[i][j]; } } #endif return s; } GCC does not manage to eliminate the code up to (1) and retains the expensive Section A. Clang manages to eliminate much more and produces: foo: // @foo // %bb.0: // %entry orr w8, wzr, #0x200 .LBB0_1: // %vector.ph // =>This Inner Loop Header: Depth=1 subs x8, x8, #1 // =1 b.ne .LBB0_1 // %bb.2: // %for.cond20.preheader.preheader fmov d0, xzr ret on aarch64. This happens at -O3 as well as -O2 as well as other targets (occurs also on x86)