https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64464
Bug ID: 64464 Summary: Optimization for reusing values in loops Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tkoenig at gcc dot gnu.org This is against gcc version 5.0.0 20141222 (experimental) (GCC), on x86_64-unknown-linux-gnu. Consider the following two program snippets: ig25@linux-fd1f:~/Krempel/Mandelbrot/Bug> cat m1.f90 module foo implicit none integer, parameter :: prec = selected_real_kind(15) integer, parameter :: n_iter = 10000 contains integer function iter_p(cx, cy) real(kind=prec), value :: cx, cy real(kind=prec) :: x, y, xn, yn integer :: k x = cx y = cy do k=1, n_iter xn = x*x - y*y + cx yn = 2*x*y + cy if (xn*xn + yn*yn > 4._prec) exit x = xn y = yn end do iter_p = k end function iter_p end module foo ig25@linux-fd1f:~/Krempel/Mandelbrot/Bug> cat m2.f90 module foo implicit none integer, parameter :: prec = selected_real_kind(15) integer, parameter :: n_iter = 10000 contains integer function iter_p(cx, cy) real(kind=prec), value :: cx, cy real(kind=prec) :: x, y, xn, yn, x2, y2 integer :: k x = cx y = cy x2 = x*x y2 = y*y do k=1, n_iter xn = x2 - y2 + cx yn = 2*x*y + cy x2 = xn * xn y2 = yn * yn if (x2 + y2> 4._prec) exit x = xn y = yn end do iter_p = k end function iter_p end module foo With -O3, the tight loop for m1.f90 is translated into .L6: addl $1, %eax movapd %xmm2, %xmm3 cmpl $10001, %eax je .L2 .L3: movapd %xmm3, %xmm2 mulsd %xmm3, %xmm2 addsd %xmm3, %xmm3 mulsd %xmm4, %xmm3 subsd %xmm5, %xmm2 movapd %xmm3, %xmm4 addsd %xmm0, %xmm2 addsd %xmm1, %xmm4 movapd %xmm2, %xmm3 movapd %xmm4, %xmm5 mulsd %xmm2, %xmm3 mulsd %xmm4, %xmm5 addsd %xmm5, %xmm3 ucomisd %xmm6, %xmm3 jbe .L6 and for m2.f90 into .L6: addl $1, %eax movapd %xmm5, %xmm4 cmpl $10001, %eax je .L2 .L3: subsd %xmm6, %xmm3 addsd %xmm4, %xmm4 movapd %xmm3, %xmm5 mulsd %xmm4, %xmm2 addsd %xmm0, %xmm5 addsd %xmm1, %xmm2 movapd %xmm5, %xmm3 mulsd %xmm5, %xmm3 movapd %xmm2, %xmm6 mulsd %xmm2, %xmm6 movapd %xmm3, %xmm4 addsd %xmm6, %xmm4 ucomisd %xmm7, %xmm4 jbe .L6 For m1.f90, this is 5 moves, 5 adds, 1 sub and 4 muls. For m2.f80, this is 5 moves, 5 adds, 1 sub and 3 muls. I would expect the same number of operations for m2.f90 as for m1.f90. Same result for 4.8, so this is (very probably) not a regression.