[Bug fortran/45159] Unneccessary temporaries

2010-08-02 Thread dominiq at lps dot ens dot fr


--- Comment #5 from dominiq at lps dot ens dot fr  2010-08-02 11:41 ---
 As pointed out by Dominique, one needs to be careful. I think one can 
 optimize:

   c = matmul(b, transpose(c))
   c = matmul(c,b)
 where c is a rank-2 matrix. I think one cannot optimize it if c is just a
 vector. (Add conj as you wish.) And check whether I have not confused 
 columns
 and rows

So far I have never seen a product matrix*matrix done in place and I
seriously doubt it can be done.
The best I can see is to have a temporary of rank 1 only instead of the full
matrix. In addition roughly speaking the matrix product is O(n**3) while the
cost of the temporary is O(n**2), probably negligible for large matrices (for
small matrices calling the present matmul is much slower than any other
possible algorithm).

Note also that the polyhedron test nf.f90 is compiled with unnecessary
temporaries:

[macbook] lin/test% gfc -Warray-temporaries nf.f90
nf.f90:293.30:

   if ( ii1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1)
  1
Warning: Creating array temporary at (1)
nf.f90:280.21:

  g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1)
 1
Warning: Creating array temporary at (1)
nf.f90:261.29:

   if ( ii1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
 1
Warning: Creating array temporary at (1)
nf.f90:248.20:

  g(i:i+nx-1) = g(i:i+nx-1) - au2(i-nx:i-1)*g(i-nx:i-1)
1
Warning: Creating array temporary at (1)

i.e., disjoint sections are not spotted.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159



[Bug fortran/45159] Unneccessary temporaries

2010-08-02 Thread burnus at gcc dot gnu dot org


--- Comment #6 from burnus at gcc dot gnu dot org  2010-08-02 12:41 ---
(In reply to comment #5)
  As pointed out by Dominique, one needs to be careful. I think one can 
  optimize:
c = matmul(b, transpose(c))
c = matmul(c,b)
  where c is a rank-2 matrix. I think one cannot optimize it if c is just a
  vector. (Add conj as you wish.) And check whether I have not confused 
  columns
  and rows
 
 So far I have never seen a product matrix*matrix done in place and I
 seriously doubt it can be done.
 The best I can see is to have a temporary of rank 1

I fail to see why a scalar temporary is not enough:

do j = 1, N
  do i = 1, N
tmp = 0.0
do k = 1, N
  tmp = a(i,k)*b(k,j)
end do
a(i,j) = tmp
  end do
end do

Note: it won't work with a scalar if one reverses the i and the j loop.
[It would work with a rank-1 B argument, but then A = matmul(B,A) won't
work as the result is a rank-1 array ...]

 Note also that the polyhedron test nf.f90 is compiled with unnecessary
 temporaries:

My feeling is that ONE, THREE (of comment 0), FIVE (of comment 1), and your
example (comment 5) point all to the same dependency-analysis problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159



[Bug fortran/45159] Unneccessary temporaries

2010-08-02 Thread dominiq at lps dot ens dot fr


--- Comment #7 from dominiq at lps dot ens dot fr  2010-08-02 13:39 ---
 I fail to see why a scalar temporary is not enough:

 do j = 1, N
   do i = 1, N
 tmp = 0.0
 do k = 1, N
   tmp = a(i,k)*b(k,j)
 end do
 a(i,j) = tmp
   end do
 end do

 Note: it won't work with a scalar if one reverses the i and the j loop.

Well, try:


real :: a(3,3), b(3,3), c(3,3), tmp
integer :: i, j, k, N=3
a = reshape([(i,i=1,9)],[3,3])
b = reshape([(10-i,i=1,9)],[3,3])
c = matmul(a, b)
do j = 1, N
  do i = 1, N
tmp = 0.0
do k = 1, N
  tmp = tmp + a(i,k)*b(k,j)
end do
a(i,j) = tmp
  end do
end do
print *, a
print *, c
if(any(a/=c)) call abort()
print *, 'OK'
end

The problem is that in order to compute a(1,2) you need a(1,1)*b(1,2), so you
cannot have already written a(1,1). Note that the effective way to compute the
matrix product on modern CPUs is to exchange the i and k loops and use a rank 1
tmp:

real :: a(3,3), b(3,3), c(3,3), tmp(3)
integer :: i, j, k, N=3
a = reshape([(i,i=1,9)],[3,3])
b = reshape([(10-i,i=1,9)],[3,3])
c = matmul(a, b)
do j = 1, N
  tmp = 0.0
  do k = 1, N
do i = 1, N
  tmp(i) = tmp(i) + a(i,k)*b(k,j)
end do
  end do
  b(:,j) = tmp
end do
if(any(b/=c)) call abort()
print *, 'OK'
end

and this work with a rank 1 temporary.

If one want to enter this kind of reduced temporaries, another candidate is
a=transpose(a) which requires a scalar temporary only.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159



[Bug fortran/45159] Unneccessary temporaries

2010-08-01 Thread burnus at gcc dot gnu dot org


--- Comment #1 from burnus at gcc dot gnu dot org  2010-08-01 16:52 ---
! --- FOUR   (also occurs for Octopus)

  subroutine t2(b,c)
implicit none
REAL :: b(3,3),c(3)
c = matmul(b, c)
  end subroutine

That's a pattern I see quite often: LHS variable on the RHS as second (or
first) argument to MATMUL. However, no temporary is needed - but one probably
needs to handle this carefully and it depends how one generates the MATMUL
internally. While most implementation ways should work, one probably can
program it such that avoiding a temporary will lead to wrong results ...
Additionally, one should confirm that it works with-fexternal-blas

As a variant: The same but with one or two items TRANSPOSEd - or CONJugated.


! --- FIVE --- (Found in octopus, www.tddft.org [GPL])

  if(any(shape(a).ne.shape(b))) then

That generates two temporaries - one for each SHAPE.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159



[Bug fortran/45159] Unneccessary temporaries

2010-08-01 Thread burnus at gcc dot gnu dot org


--- Comment #2 from burnus at gcc dot gnu dot org  2010-08-01 17:28 ---
(In reply to comment #1)
 ! --- FOUR   (also occurs for Octopus)
 c = matmul(b, c)

As pointed out by Dominique, one needs to be careful. I think one can optimize:

  c = matmul(b, transpose(c))
  c = matmul(c,b)
where c is a rank-2 matrix. I think one cannot optimize it if c is just a
vector. (Add conj as you wish.) And check whether I have not confused columns
and rows

Side remark:
  sum(array*scalar) = sum(array)*scalar


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159



[Bug fortran/45159] Unneccessary temporaries

2010-08-01 Thread tkoenig at gcc dot gnu dot org


--- Comment #3 from tkoenig at gcc dot gnu dot org  2010-08-01 17:37 ---
Confirmed for the test cases in comment #0.

You were right that my patch at

http://gcc.gnu.org/ml/fortran/2010-08/msg3.html

doesn't fix this.  I'll have a look...


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2010-08-01 17:37:25
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159



[Bug fortran/45159] Unneccessary temporaries

2010-08-01 Thread tkoenig at gcc dot gnu dot org


--- Comment #4 from tkoenig at gcc dot gnu dot org  2010-08-01 18:12 ---
This piece of code

 /* If no intention of reversing or reversing is explicitly
 inhibited, convert backward dependence to overlap.  */
  if ((reverse == NULL  this_dep == GFC_DEP_BACKWARD)
|| (reverse  reverse[n] == GFC_CANNOT_REVERSE))
this_dep = GFC_DEP_OVERLAP;

looks more fishy.  It sets this_dep to GFC_DEP_OVERLAP even if we don't want to
reverse at all.

Shouldn't this all be conditional on

if (this_dep == GFC_DEP_BACKWARD)

Paul, do you have any ideas?


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pault at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159