[Bug fortran/45159] Unneccessary temporaries
--- Comment #5 from dominiq at lps dot ens dot fr 2010-08-02 11:41 --- As pointed out by Dominique, one needs to be careful. I think one can optimize: c = matmul(b, transpose(c)) c = matmul(c,b) where c is a rank-2 matrix. I think one cannot optimize it if c is just a vector. (Add conj as you wish.) And check whether I have not confused columns and rows So far I have never seen a product matrix*matrix done in place and I seriously doubt it can be done. The best I can see is to have a temporary of rank 1 only instead of the full matrix. In addition roughly speaking the matrix product is O(n**3) while the cost of the temporary is O(n**2), probably negligible for large matrices (for small matrices calling the present matmul is much slower than any other possible algorithm). Note also that the polyhedron test nf.f90 is compiled with unnecessary temporaries: [macbook] lin/test% gfc -Warray-temporaries nf.f90 nf.f90:293.30: if ( ii1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1) 1 Warning: Creating array temporary at (1) nf.f90:280.21: g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1) 1 Warning: Creating array temporary at (1) nf.f90:261.29: if ( ii1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1) 1 Warning: Creating array temporary at (1) nf.f90:248.20: g(i:i+nx-1) = g(i:i+nx-1) - au2(i-nx:i-1)*g(i-nx:i-1) 1 Warning: Creating array temporary at (1) i.e., disjoint sections are not spotted. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159
[Bug fortran/45159] Unneccessary temporaries
--- Comment #6 from burnus at gcc dot gnu dot org 2010-08-02 12:41 --- (In reply to comment #5) As pointed out by Dominique, one needs to be careful. I think one can optimize: c = matmul(b, transpose(c)) c = matmul(c,b) where c is a rank-2 matrix. I think one cannot optimize it if c is just a vector. (Add conj as you wish.) And check whether I have not confused columns and rows So far I have never seen a product matrix*matrix done in place and I seriously doubt it can be done. The best I can see is to have a temporary of rank 1 I fail to see why a scalar temporary is not enough: do j = 1, N do i = 1, N tmp = 0.0 do k = 1, N tmp = a(i,k)*b(k,j) end do a(i,j) = tmp end do end do Note: it won't work with a scalar if one reverses the i and the j loop. [It would work with a rank-1 B argument, but then A = matmul(B,A) won't work as the result is a rank-1 array ...] Note also that the polyhedron test nf.f90 is compiled with unnecessary temporaries: My feeling is that ONE, THREE (of comment 0), FIVE (of comment 1), and your example (comment 5) point all to the same dependency-analysis problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159
[Bug fortran/45159] Unneccessary temporaries
--- Comment #7 from dominiq at lps dot ens dot fr 2010-08-02 13:39 --- I fail to see why a scalar temporary is not enough: do j = 1, N do i = 1, N tmp = 0.0 do k = 1, N tmp = a(i,k)*b(k,j) end do a(i,j) = tmp end do end do Note: it won't work with a scalar if one reverses the i and the j loop. Well, try: real :: a(3,3), b(3,3), c(3,3), tmp integer :: i, j, k, N=3 a = reshape([(i,i=1,9)],[3,3]) b = reshape([(10-i,i=1,9)],[3,3]) c = matmul(a, b) do j = 1, N do i = 1, N tmp = 0.0 do k = 1, N tmp = tmp + a(i,k)*b(k,j) end do a(i,j) = tmp end do end do print *, a print *, c if(any(a/=c)) call abort() print *, 'OK' end The problem is that in order to compute a(1,2) you need a(1,1)*b(1,2), so you cannot have already written a(1,1). Note that the effective way to compute the matrix product on modern CPUs is to exchange the i and k loops and use a rank 1 tmp: real :: a(3,3), b(3,3), c(3,3), tmp(3) integer :: i, j, k, N=3 a = reshape([(i,i=1,9)],[3,3]) b = reshape([(10-i,i=1,9)],[3,3]) c = matmul(a, b) do j = 1, N tmp = 0.0 do k = 1, N do i = 1, N tmp(i) = tmp(i) + a(i,k)*b(k,j) end do end do b(:,j) = tmp end do if(any(b/=c)) call abort() print *, 'OK' end and this work with a rank 1 temporary. If one want to enter this kind of reduced temporaries, another candidate is a=transpose(a) which requires a scalar temporary only. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159
[Bug fortran/45159] Unneccessary temporaries
--- Comment #1 from burnus at gcc dot gnu dot org 2010-08-01 16:52 --- ! --- FOUR (also occurs for Octopus) subroutine t2(b,c) implicit none REAL :: b(3,3),c(3) c = matmul(b, c) end subroutine That's a pattern I see quite often: LHS variable on the RHS as second (or first) argument to MATMUL. However, no temporary is needed - but one probably needs to handle this carefully and it depends how one generates the MATMUL internally. While most implementation ways should work, one probably can program it such that avoiding a temporary will lead to wrong results ... Additionally, one should confirm that it works with-fexternal-blas As a variant: The same but with one or two items TRANSPOSEd - or CONJugated. ! --- FIVE --- (Found in octopus, www.tddft.org [GPL]) if(any(shape(a).ne.shape(b))) then That generates two temporaries - one for each SHAPE. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159
[Bug fortran/45159] Unneccessary temporaries
--- Comment #2 from burnus at gcc dot gnu dot org 2010-08-01 17:28 --- (In reply to comment #1) ! --- FOUR (also occurs for Octopus) c = matmul(b, c) As pointed out by Dominique, one needs to be careful. I think one can optimize: c = matmul(b, transpose(c)) c = matmul(c,b) where c is a rank-2 matrix. I think one cannot optimize it if c is just a vector. (Add conj as you wish.) And check whether I have not confused columns and rows Side remark: sum(array*scalar) = sum(array)*scalar -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159
[Bug fortran/45159] Unneccessary temporaries
--- Comment #3 from tkoenig at gcc dot gnu dot org 2010-08-01 17:37 --- Confirmed for the test cases in comment #0. You were right that my patch at http://gcc.gnu.org/ml/fortran/2010-08/msg3.html doesn't fix this. I'll have a look... -- tkoenig at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-08-01 17:37:25 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159
[Bug fortran/45159] Unneccessary temporaries
--- Comment #4 from tkoenig at gcc dot gnu dot org 2010-08-01 18:12 --- This piece of code /* If no intention of reversing or reversing is explicitly inhibited, convert backward dependence to overlap. */ if ((reverse == NULL this_dep == GFC_DEP_BACKWARD) || (reverse reverse[n] == GFC_CANNOT_REVERSE)) this_dep = GFC_DEP_OVERLAP; looks more fishy. It sets this_dep to GFC_DEP_OVERLAP even if we don't want to reverse at all. Shouldn't this all be conditional on if (this_dep == GFC_DEP_BACKWARD) Paul, do you have any ideas? -- tkoenig at gcc dot gnu dot org changed: What|Removed |Added CC||pault at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159