http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46198
Summary: movd xmm, r (xmm -> GPR) may hit the stack Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: tbp...@gmail.com Host: x86_64-pc-linux-gnu Target: x86_64-pc-linux-gnu Under somewhat elusive (target) conditions... $ cat movd.c #include <emmintrin.h> int foo(__m128i x) { return _mm_cvtsi128_si32(x); } $ gcc -O3 movd.c -S -o - foo: .LFB521: .cfi_startproc movd %xmm0, -12(%rsp) movl -12(%rsp), %eax ret $ gcc -Os movd.c -S -o - foo: .LFB514: .cfi_startproc movd %xmm0, -12(%rsp) movl -12(%rsp), %eax ret $ gcc -O3 -march=native movd.c -S -o - foo: .LFB521: .cfi_startproc movd %xmm0, %eax ret ... movd may or may not pay a trip to the stack, apparently depending on some target cpu+SSE level condition (here native is a corei7). I can't see any good reason for that (and certainly not for size). Known to happen from gcc 4.4.4 to $ /usr/local/gcc-4.6-20101026/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc-4.6-20101026/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-4.6-20101026/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr/local/gcc-4.6.0 --enable-languages=c,c++ --enable-threads=posix --disable-nls --with-system-zlib --disable-bootstrap --enable-mpfr --enable-gold --enable-lto --with-ppl --with-cloog --with-arch=native --enable-checking=release Thread model: posix gcc version 4.6.0 20101026 (experimental) (GCC)