https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*, i?86-*-*
      Known to work|                            |5.0
            Summary|[4.8/4.9/5 Regression]      |[4.8/4.9 Regression]
                   |Performance breakdown for   |Performance breakdown for
                   |gcc-4.{6,7} vs. gcc-4.5     |gcc-4.{6,7} vs. gcc-4.5
                   |using std::vector in matrix |using std::vector in matrix
                   |vector multiplication       |vector multiplication
                   |(IVopts / inliner)          |(IVopts / inliner)

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, with trunk (gcc 5) I see

.L13:
        movsd   (%rdx), %xmm1
        xorl    %eax, %eax
.L12:
        movsd   -8(%rcx,%rax), %xmm0
        mulsd   (%rsi,%rax), %xmm0
        addq    $8, %rax
        cmpq    $24, %rax
        addsd   %xmm0, %xmm1
        movsd   %xmm1, (%rdx)
        jne     .L12
        addq    $8, %rdx
        addq    $8, %rcx
        addq    $24, %rsi
        cmpq    %rdi, %rdx
        jne     .L13

thus maybe even better than 4.5.

GCC 4.9 produces

.L17:
        leaq    (%r8,%rdx), %rcx
        movsd   8(%rdi,%rdx), %xmm1
        xorl    %eax, %eax
        addq    %r9, %rcx
.L14:
        movsd   -8(%rcx,%rax), %xmm0
        mulsd   (%rsi,%rax), %xmm0
        addq    $8, %rax
        cmpq    $24, %rax
        addsd   %xmm0, %xmm1
        movsd   %xmm1, 8(%rdi,%rdx)
        jne     .L14
        addq    $8, %rdx
        addq    $24, %rsi
        cmpq    $1016, %rdx
        jne     .L17

it might be again inliner changes that trigger the better behavior of course.

So - fixed in GCC 5.  Not sure how to produce a testcase that reliably
tracks good behavior here.  IVOPTs dumping should be improved somewhat.

Reply via email to