https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067
Jim Wilson <wilson at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> --- This looks similar to an ivopts problem I looked at regarding coremark. Given this testcase void matrix_add_const_unsigned(unsigned int N, short *A, short val) { unsigned int i,j; for (i=0; i<N; i++) { for (j=0; j<N; j++) { A[i*N+j] += val; } } } void matrix_add_const_signed(signed int N, short *A, short val) { signed int i,j; for (i=0; i<N; i++) { for (j=0; j<N; j++) { A[i*N+j] += val; } } } and compiling for 64-bit targets with -O2, we get much better code for the second function than the first function. For riscv64, the first function has 8 instructions in the inner loop. The second function has 5 instructions in the inner loop. I reproduced this problem on multiple 64-bit targets including mips64, ppc64, arm64. The problem I saw was that with a signed iterator, ivopts decides that we can ignore overflow and it is safe to eliminate. With an unsigned interator, it decides that unsigned overflow can't be ignored. Then it looks at loop bounds. If the loop bound is unknown, e.g. it is a function parameter in this case, then it decides that this indunction variable isn't safe to eliminate and we get poor optimization. Brian's testcase appears to be another issue of this. With the original code ivopts turns a[i] into an unsigned interator and then sees that the loop bound is a global variable and apparently decides it can't eliminate it. With the modified code using int16_t *p, gcc decides that it can eliminate it, and we get better code. This issue shows up with 32-bit targets but appears related to the above. I don't know if the ivopts issue can be fixed.