https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |x86_64-*-*, i?86-*-* Known to work| |5.0 Summary|[4.8/4.9/5 Regression] |[4.8/4.9 Regression] |Performance breakdown for |Performance breakdown for |gcc-4.{6,7} vs. gcc-4.5 |gcc-4.{6,7} vs. gcc-4.5 |using std::vector in matrix |using std::vector in matrix |vector multiplication |vector multiplication |(IVopts / inliner) |(IVopts / inliner) --- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- Btw, with trunk (gcc 5) I see .L13: movsd (%rdx), %xmm1 xorl %eax, %eax .L12: movsd -8(%rcx,%rax), %xmm0 mulsd (%rsi,%rax), %xmm0 addq $8, %rax cmpq $24, %rax addsd %xmm0, %xmm1 movsd %xmm1, (%rdx) jne .L12 addq $8, %rdx addq $8, %rcx addq $24, %rsi cmpq %rdi, %rdx jne .L13 thus maybe even better than 4.5. GCC 4.9 produces .L17: leaq (%r8,%rdx), %rcx movsd 8(%rdi,%rdx), %xmm1 xorl %eax, %eax addq %r9, %rcx .L14: movsd -8(%rcx,%rax), %xmm0 mulsd (%rsi,%rax), %xmm0 addq $8, %rax cmpq $24, %rax addsd %xmm0, %xmm1 movsd %xmm1, 8(%rdi,%rdx) jne .L14 addq $8, %rdx addq $24, %rsi cmpq $1016, %rdx jne .L17 it might be again inliner changes that trigger the better behavior of course. So - fixed in GCC 5. Not sure how to produce a testcase that reliably tracks good behavior here. IVOPTs dumping should be improved somewhat.