[Bug c/54231] LTO generates code for the wrong CPU if different options used

2012-08-11 Thread thiago at kde dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #1 from Thiago Macieira thiago at kde dot org 2012-08-11 22:30:50 
UTC ---
Created attachment 27993
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27993
main.c


[Bug c/54231] LTO generates code for the wrong CPU if different options used

2012-08-11 Thread thiago at kde dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #2 from Thiago Macieira thiago at kde dot org 2012-08-11 22:33:31 
UTC ---
When adding the following source file to the library build:

#include stdlib.h
void bzero_sse2(char *, size_t);
void bzero_avx(char *, size_t);

extern int avx_supported;

void my_bzero(char *ptr, size_t n)
{
if (avx_supported)
bzero_avx(ptr, n);
else
bzero_sse2(ptr, n);
}


and compiling everything with -O2 -flto, GCC produces the following function:

02e0 my_bzero:
 2e0:   mov0x200171(%rip),%rax# 200458 my_bzero+0x200178
 2e7:   mov(%rax),%eax
 2e9:   test   %eax,%eax
 2eb:   jne310 my_bzero+0x30
 2ed:   test   %rsi,%rsi
 2f0:   vpxor  %xmm0,%xmm0,%xmm0
 2f4:   je 30e my_bzero+0x2e
 2f6:   nopw   %cs:0x0(%rax,%rax,1)
 300:   vmovntdq %xmm0,(%rdi)
 304:   add$0x10,%rdi
 308:   sub$0x1,%rsi
 30c:   jne300 my_bzero+0x20
 30e:   repz retq 
 310:   test   %rsi,%rsi
 313:   je 30e my_bzero+0x2e
 315:   vpxor  %xmm0,%xmm0,%xmm0
 319:   nopl   0x0(%rax)
 320:   vmovntdq %xmm0,(%rdi)
 324:   add$0x10,%rdi
 328:   sub$0x1,%rsi
 32c:   jne320 my_bzero+0x40
 32e:   repz retq 

As can be seen, VEX-prefixed instructions were used in both cases.


[Bug c/54231] LTO generates code for the wrong CPU if different options used

2012-08-11 Thread thiago at kde dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #3 from Thiago Macieira thiago at kde dot org 2012-08-11 22:36:20 
UTC ---
Another note: it appears the Intel compiler has the same bug. It produces the
following code when compiling with -O2 -ipo:


0340 my_bzero:
 340:   dec%rsi
 343:   mov0x2001ae(%rip),%rax# 2004f8 _DYNAMIC+0xe0
 34a:   vpxor  %xmm0,%xmm0,%xmm0
 34e:   cmpl   $0x0,(%rax)
 351:   je 36c my_bzero+0x2c
 353:   cmp$0x,%rsi
 357:   je 383 my_bzero+0x43
 359:   dec%rsi
 35c:   vmovntdq %xmm0,(%rdi)
 360:   add$0x10,%rdi
 364:   cmp$0x,%rsi
 368:   jne359 my_bzero+0x19
 36a:   jmp383 my_bzero+0x43
 36c:   cmp$0x,%rsi
 370:   je 383 my_bzero+0x43
 372:   dec%rsi
 375:   vmovntdq %xmm0,(%rdi)
 379:   add$0x10,%rdi
 37d:   cmp$0x,%rsi
 381:   jne372 my_bzero+0x32
 383:   retq   
 384:   nopl   0x0(%rax,%rax,1)
 389:   nopl   0x0(%rax)

Note, additionally, that there's an instruction-scheduling issue: a VPXOR
instruction was scheduled to before the test of the CPU features.