https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64464

            Bug ID: 64464
           Summary: Optimization for reusing values in loops
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org

This is against gcc version 5.0.0 20141222 (experimental) (GCC), on
x86_64-unknown-linux-gnu.

Consider the following two program snippets:

ig25@linux-fd1f:~/Krempel/Mandelbrot/Bug> cat m1.f90
module foo
  implicit none
  integer, parameter :: prec = selected_real_kind(15)
  integer, parameter :: n_iter = 10000

contains
  integer function iter_p(cx, cy)
    real(kind=prec), value :: cx, cy
    real(kind=prec) :: x, y, xn, yn
    integer :: k
    x = cx
    y = cy
    do k=1, n_iter
       xn = x*x - y*y + cx
       yn = 2*x*y + cy
       if (xn*xn + yn*yn > 4._prec) exit
       x = xn
       y = yn
    end do
    iter_p = k
  end function iter_p
end module foo
ig25@linux-fd1f:~/Krempel/Mandelbrot/Bug> cat m2.f90
module foo
  implicit none
  integer, parameter :: prec = selected_real_kind(15)
  integer, parameter :: n_iter = 10000

contains
  integer function iter_p(cx, cy)
    real(kind=prec), value :: cx, cy
    real(kind=prec) :: x, y, xn, yn, x2, y2
    integer :: k
    x = cx
    y = cy
    x2 = x*x
    y2 = y*y
    do k=1, n_iter
       xn = x2 - y2 + cx
       yn = 2*x*y + cy
       x2 = xn * xn
       y2 = yn * yn
       if (x2 + y2> 4._prec) exit
       x = xn
       y = yn
    end do
    iter_p = k
  end function iter_p
end module foo

With -O3, the tight loop for m1.f90 is translated into

.L6:
        addl    $1, %eax
        movapd  %xmm2, %xmm3
        cmpl    $10001, %eax
        je      .L2
.L3:
        movapd  %xmm3, %xmm2
        mulsd   %xmm3, %xmm2
        addsd   %xmm3, %xmm3
        mulsd   %xmm4, %xmm3
        subsd   %xmm5, %xmm2
        movapd  %xmm3, %xmm4
        addsd   %xmm0, %xmm2
        addsd   %xmm1, %xmm4
        movapd  %xmm2, %xmm3
        movapd  %xmm4, %xmm5
        mulsd   %xmm2, %xmm3
        mulsd   %xmm4, %xmm5
        addsd   %xmm5, %xmm3
        ucomisd %xmm6, %xmm3
        jbe     .L6

and for m2.f90 into

.L6:
        addl    $1, %eax
        movapd  %xmm5, %xmm4
        cmpl    $10001, %eax
        je      .L2
.L3:
        subsd   %xmm6, %xmm3
        addsd   %xmm4, %xmm4
        movapd  %xmm3, %xmm5
        mulsd   %xmm4, %xmm2
        addsd   %xmm0, %xmm5
        addsd   %xmm1, %xmm2
        movapd  %xmm5, %xmm3
        mulsd   %xmm5, %xmm3
        movapd  %xmm2, %xmm6
        mulsd   %xmm2, %xmm6
        movapd  %xmm3, %xmm4
        addsd   %xmm6, %xmm4
        ucomisd %xmm7, %xmm4
        jbe     .L6

For m1.f90, this is 5 moves, 5 adds, 1 sub and 4 muls.

For m2.f80, this is 5 moves, 5 adds, 1 sub and 3 muls.

I would expect the same number of operations for m2.f90 as for m1.f90.

Same result for 4.8, so this is (very probably) not a regression.

Reply via email to