http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60412
Bug ID: 60412 Summary: superfluous arithmetic generated for uneven tail handling Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jens.maurer at gmx dot net Created attachment 32262 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32262&action=edit Testcase showing superfluous arithmetic in assembly The attached function processes a (possibly long) memory range. If more than 7 bytes are available, those are processed using g1(), which is able to process 8 bytes in one call. The tail end is then handled by g2(), one byte at a time. (In my application, the g1 and g2 functions are the Intel built-in CRC-32C operations.) On Intel x86-64, the following superfluous assembly code is produced between the calls to g1() and g2(): $ gcc -v -O3 -Wall -Wextra -c opt-tail.c ... Target: x86_64-unknown-linux-gnu ... GNU C (GCC) version 4.8.2 (x86_64-unknown-linux-gnu) compiled by GNU C version 4.8.2, GMP version 5.1.2, MPFR version 3.1.1-p2, MPC version 1.0.1 $ objdump -C -D opt-tail.o ... 44: 4c 89 e2 mov %r12,%rdx 47: 48 29 da sub %rbx,%rdx 4a: 48 83 ea 08 sub $0x8,%rdx 4e: 48 c1 ea 03 shr $0x3,%rdx 52: 48 8d 5c d3 08 lea 0x8(%rbx,%rdx,8),%rbx This seems to be completely redundant.