[Bug target/37437] New: [4.4 regression] speed regression

tim at klingt dot org Mon, 08 Sep 2008 17:41:41 -0700

doing some benchmarks on attached file (preprocessed source), i found, that
gcc-4.4 is somehow (in my case about 4%) slower than gcc-4.3 on x86_64, tuned
for core2:


code compiled with -O3 -march=core2

versions:
g++-4.3 -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.3.2-0ubuntu3'
--with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3
--program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug
--enable-objc-gc --enable-mpfr --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.2 (Ubuntu 4.3.2-0ubuntu3)

g++-4.4 -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../gcc-4.4-20080815/configure
--enable-languages=c,c++,fortran,objc,obj-c++ --enable-shared
--with-system-zlib --enable-mpfr --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/local/include/c++/4.4 --program-suffix=-4.4
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
Thread model: posix
gcc version 4.4.0 20080815 (experimental) (GCC)

gcc-4.3 produces:
0000000000401040 <loop(nova::biquad<float, float, false, true>&, float*,
float*, int)>:
  401040:       ff c9                   dec    %ecx
  401042:       f3 0f 10 6f 18          movss  0x18(%rdi),%xmm5
  401047:       f3 0f 10 67 14          movss  0x14(%rdi),%xmm4
  40104c:       31 c0                   xor    %eax,%eax
  40104e:       f3 0f 10 35 6a 0d 00    movss  0xd6a(%rip),%xmm6        #
401dc0 <boost::array<float, 3ul>::operator[](unsigned
long)::__PRETTY_FUNCTION__+0x60>
  401055:       00 
  401056:       48 8d 0c 8d 04 00 00    lea    0x4(,%rcx,4),%rcx
  40105d:       00 
  40105e:       66 90                   xchg   %ax,%ax

while gcc-4.4 produces:
0000000000400fe0 <loop(nova::biquad<float, float, false, true>&, float*,
float*, int)>:
  400fe0:       49 89 d0                mov    %rdx,%r8
  400fe3:       f3 0f 10 67 14          movss  0x14(%rdi),%xmm4
  400fe8:       8b 47 18                mov    0x18(%rdi),%eax
  400feb:       66 0f 7e e2             movd   %xmm4,%edx
  400fef:       48 c1 e0 20             shl    $0x20,%rax
  400ff3:       89 d2                   mov    %edx,%edx
  400ff5:       ff c9                   dec    %ecx
  400ff7:       48 09 d0                or     %rdx,%rax
  400ffa:       f3 0f 10 35 3e 0d 00    movss  0xd3e(%rip),%xmm6        #
401d40 <boost::array<float, 3ul>::operator[](unsigned
long)::__PRETTY_FUNCTION__+0x60>
  401001:       00 
  401002:       48 c1 e8 20             shr    $0x20,%rax
  401006:       48 8d 14 8d 04 00 00    lea    0x4(,%rcx,4),%rdx
  40100d:       00 
  40100e:       66 0f 6e e8             movd   %eax,%xmm5
  401012:       31 c0                   xor    %eax,%eax

the rest of the code is equivalent ...

i am not really familiar with x86_64 assembly, but 
mov    0x18(%rdi),%eax
movd   %eax,%xmm5
has been realizied by gcc-4.3 as
movss  0x18(%rdi),%xmm5

and for the other code, registers seem to be allocated and reused in a more
efficient way ...


-- 
           Summary: [4.4 regression] speed regression
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tim at klingt dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37437

[Bug target/37437] New: [4.4 regression] speed regression

Reply via email to