[Bug c/54231] LTO generates code for the wrong CPU if different options used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231 --- Comment #1 from Thiago Macieira thiago at kde dot org 2012-08-11 22:30:50 UTC --- Created attachment 27993 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27993 main.c
[Bug c/54231] LTO generates code for the wrong CPU if different options used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231 --- Comment #2 from Thiago Macieira thiago at kde dot org 2012-08-11 22:33:31 UTC --- When adding the following source file to the library build: #include stdlib.h void bzero_sse2(char *, size_t); void bzero_avx(char *, size_t); extern int avx_supported; void my_bzero(char *ptr, size_t n) { if (avx_supported) bzero_avx(ptr, n); else bzero_sse2(ptr, n); } and compiling everything with -O2 -flto, GCC produces the following function: 02e0 my_bzero: 2e0: mov0x200171(%rip),%rax# 200458 my_bzero+0x200178 2e7: mov(%rax),%eax 2e9: test %eax,%eax 2eb: jne310 my_bzero+0x30 2ed: test %rsi,%rsi 2f0: vpxor %xmm0,%xmm0,%xmm0 2f4: je 30e my_bzero+0x2e 2f6: nopw %cs:0x0(%rax,%rax,1) 300: vmovntdq %xmm0,(%rdi) 304: add$0x10,%rdi 308: sub$0x1,%rsi 30c: jne300 my_bzero+0x20 30e: repz retq 310: test %rsi,%rsi 313: je 30e my_bzero+0x2e 315: vpxor %xmm0,%xmm0,%xmm0 319: nopl 0x0(%rax) 320: vmovntdq %xmm0,(%rdi) 324: add$0x10,%rdi 328: sub$0x1,%rsi 32c: jne320 my_bzero+0x40 32e: repz retq As can be seen, VEX-prefixed instructions were used in both cases.
[Bug c/54231] LTO generates code for the wrong CPU if different options used
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231 --- Comment #3 from Thiago Macieira thiago at kde dot org 2012-08-11 22:36:20 UTC --- Another note: it appears the Intel compiler has the same bug. It produces the following code when compiling with -O2 -ipo: 0340 my_bzero: 340: dec%rsi 343: mov0x2001ae(%rip),%rax# 2004f8 _DYNAMIC+0xe0 34a: vpxor %xmm0,%xmm0,%xmm0 34e: cmpl $0x0,(%rax) 351: je 36c my_bzero+0x2c 353: cmp$0x,%rsi 357: je 383 my_bzero+0x43 359: dec%rsi 35c: vmovntdq %xmm0,(%rdi) 360: add$0x10,%rdi 364: cmp$0x,%rsi 368: jne359 my_bzero+0x19 36a: jmp383 my_bzero+0x43 36c: cmp$0x,%rsi 370: je 383 my_bzero+0x43 372: dec%rsi 375: vmovntdq %xmm0,(%rdi) 379: add$0x10,%rdi 37d: cmp$0x,%rsi 381: jne372 my_bzero+0x32 383: retq 384: nopl 0x0(%rax,%rax,1) 389: nopl 0x0(%rax) Note, additionally, that there's an instruction-scheduling issue: a VPXOR instruction was scheduled to before the test of the CPU features.