http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58296
Bug ID: 58296 Summary: ivopts is unable to handle some loops altered by the loop header copying pass Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: uranus at tinlans dot org $ cat test.c void bne_loop(unsigned int val,unsigned int N) { int i; for (i=0;i<N;++i) printf("%d\n",val+i); } Please note that the comparison expression in the for loop, 'i < N', is a comparison between a signed int variable and an unsigned int variable. If we change the type of i from 'int' to 'unsigned int', the issue won't be occured. $ arm-eabi-gcc -v Using built-in specs. COLLECT_GCC=arm-eabi-gcc COLLECT_LTO_WRAPPER=/home1/lhtseng/arm/4.9/libexec/gcc/arm-eabi/4.9.0/lto-wrapper Target: arm-eabi Configured with: ../../../../work/4.9/src/gcc-4.9.0/configure --target=arm-eabi --prefix=/home1/lhtseng/arm/4.9 --disable-nls --disable-shared --enable-languages=c --enable-__cxa_atexit --enable-c99 --enable-long-long --enable-threads=single --with-newlib --disable-multilib --disable-libssp --disable-libgomp --disable-decimal-float --disable-libffi --disable-libmudflap --disable-lto --with-gmp=/home1/lhtseng/work/general --with-mpfr=/home1/lhtseng/work/general --with-mpc=/home1/lhtseng/work/general --with-isl=/home1/lhtseng/work/general --with-cloog=/home1/lhtseng/work/general Thread model: single gcc version 4.9.0 20130802 (experimental) (GCC) $ arm-eabi-gcc -O3 -fdump-tree-all -O3 -da -S test.c $ cat -n test.s ... 27 .L3: 28 add r1, r1, r5 29 add r4, r4, #1 30 ldr r0, .L9 31 bl printf 32 cmp r4, r6 33 mov r1, r4 34 bne .L3 ... The instruction 'mov r1, r4' is redundant. Reading the dump of the RTL generation pass can understand how it's expanded: $ cat test.c.166r.expand ... ;; i.0_4 = (unsigned int) i_9; (insn 20 19 0 (set (reg:SI 110 [ i.0 ]) (reg/v:SI 112 [ i ])) ../test.c:6 -1 (nil)) ... $ cat test.c.165t.optimized ... <bb 4>: # i_13 = PHI <i_9(5), 0(3)> # i.0_16 = PHI <i.0_4(5), 0(3)> _7 = i.0_16 + val_6(D); printf ("%d\n", _7); i_9 = i_13 + 1; i.0_4 = (unsigned int) i_9; if (i_9 != _15) goto <bb 5>; else goto <bb 6>; ... It's surprised that the line 'i.0_4 = (unsigned int) i_9;' cannot be handled by any tree-level optimization passes and RTL level optimization passes. After doing some investigations, we finally find that using '-Os' or '-fno-tree-ch' instead of '-O3' can generate the optimized code, and the conversion was eliminated by ivopts properly: $ arm-eabi-gcc -O3 -fdump-tree-all -O3 -fno-tree-ch -da -S test.c $ cat test.c.119t.ivopts <bb 3>: _7 = ivtmp.9_11; printf ("%d\n", _7); ivtmp.9_10 = ivtmp.9_11 + 1; <bb 4>: # ivtmp.9_11 = PHI <val_6(D)(2), ivtmp.9_10(3)> if (ivtmp.9_11 != _12) goto <bb 3>; else goto <bb 5>; $ cat test.s ... .L3: mov r1, r4 bl printf add r4, r4, #1 .L2: cmp r4, r5 ldr r0, .L6 bne .L3 ldmfd sp!, {r3, r4, r5, lr} bx lr ... Therefore, it's believed that there are something wrong with ivopts, which is unable to handle the loop altered by the tree-ch pass when there is a comparison (int v.s. unsigned int) in the condition field of a FOR statement.