[Bug fortran/42118] Slow forall

2019-11-24 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

Thomas Koenig  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 CC||tkoenig at gcc dot gnu.org
 Resolution|--- |WONTFIX

--- Comment #11 from Thomas Koenig  ---
(In reply to Lionel GUEZ from comment #10)
> (In reply to kargl from comment #9)
> > Fortran 2018 has declared FORALL to be an obsolescent feature.
> > I doubt that anyone will ever try to improve the performance
> > of FORALL, because the next standard is likely to delete it.
> > 
> > I think that this bug can be closed with WONTFIX or WORKSFORME.
> 
> OK for me.

We have had forall loop interchange for quite some time now, and that
is all the effort that people are likely to put into this.

So, closing as WONTFIX.

[Bug fortran/42118] Slow forall

2019-10-07 Thread guez at lmd dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

--- Comment #10 from Lionel GUEZ  ---
(In reply to kargl from comment #9)
> Fortran 2018 has declared FORALL to be an obsolescent feature.
> I doubt that anyone will ever try to improve the performance
> of FORALL, because the next standard is likely to delete it.
> 
> I think that this bug can be closed with WONTFIX or WORKSFORME.

OK for me.

[Bug fortran/42118] Slow forall

2019-10-04 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |WAITING
 CC||kargl at gcc dot gnu.org

--- Comment #9 from kargl at gcc dot gnu.org ---
Fortran 2018 has declared FORALL to be an obsolescent feature.
I doubt that anyone will ever try to improve the performance
of FORALL, because the next standard is likely to delete it.

I think that this bug can be closed with WONTFIX or WORKSFORME.

[Bug fortran/42118] Slow forall

2019-02-02 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

Dominique d'Humieres  changed:

   What|Removed |Added

   Priority|P3  |P4

[Bug fortran/42118] Slow forall

2013-10-08 Thread ebay.20.tedlap at spamgourmet dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

Lionel GUEZ ebay.20.tedlap at spamgourmet dot com changed:

   What|Removed |Added

 CC||ebay.20.tedlap@spamgourmet.
   ||com

--- Comment #6 from Lionel GUEZ ebay.20.tedlap at spamgourmet dot com ---
There is also the problem of the order of indices in a forall. I guess this is
in close relation to the comparison of do and forall. Consider the following
test program :

program test_forall
  implicit none
  integer, parameter:: n = 1000
  integer i, j, k
  double precision S(n, n, n)
  forall (i = 1: n, j = 1: n, k = 1: n) S(i, j, k) = i * j * k
  print *, ijk, sum(s) = , sum(s)
end program test_forall

According to the Fortran standard, the order of indices in the forall header is
of no consequence. So, in the above program, we should be able to write
equivalently :

  forall (k = 1: n, j = 1: n, i = 1: n) S(i, j, k) = i * j * k

There is no way for the writer of the program to predict which of the two
versions should be faster. It is interesting to note that, with gfortran, the
forall with kji is much slower, while the inverse is true with the NAG compiler
(version 5.3). I think the two versions should have the same run time. I have
actually tested the two versions of the program with four compilers :

-- gfortran 4.4.6 with -O3

 kji, sum(s) =   1.253753751250046E+017

real1m32.511s
user1m22.342s
sys0m8.368s

 ijk, sum(s) =   1.253753751250046E+017

real0m12.962s
user0m7.416s
sys0m5.427s


-- nagfor 5.3 with -O4

 kji, sum(s) =1.2537537512500458E+17

real0m13.396s
user0m6.833s
sys0m6.054s

 ijk, sum(s) =1.2537537512500458E+17

real2m37.943s
user2m27.723s
sys0m7.873s


-- pgf95 11.10 with -fast 

 kji, sum(s) =1.253753751248E+017

real0m12.119s
user0m6.051s
sys0m5.910s

 ijk, sum(s) =1.253753751248E+017

real0m11.979s
user0m5.854s
sys0m5.939s

-- ifort 12.1 with -O3 :

 kji, sum(s) =   1.25375375125E+017

real0m5.210s
user0m3.028s
sys0m2.150s

 ijk, sum(s) =   1.25375375125E+017

real0m5.114s
user0m2.981s
sys0m2.115s

So we see that PG Fortran and Intel Fortran behave well : the two versions take
about the same time. Also Intel Fortran is much faster than other compilers on
this test.

I would also like to comment on the use of the forall. Tobias Burnus says that
improving the forall in Gfortran is not worth the effort. I think the forall is
useful. It is an elegant way to write some assignments. There is no idea of
time sequence in a forall and the forall can only contain an assignement while,
as you know, the do construct could contain call to subroutines, input-output,
recursive computations, anything. So when one reads a program and sees the
forall it is much more quickly clear to understand what is going on than when
one reads a do loop. Also the fact that assignments are independent (comment of
Harald Anlauf) should make it easier for the compiler to produce a fast code.


[Bug fortran/42118] Slow forall

2013-10-08 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

Tobias Burnus burnus at gcc dot gnu.org changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org

--- Comment #7 from Tobias Burnus burnus at gcc dot gnu.org ---
(In reply to Harald Anlauf from comment #5)
 Do not forget that there are constraints for FORALL statements that are
 not required for DO loops so that all assignments are independent.
 This guarantees vectorization

Not quite: The Fortran standard requires that the RHS is evaluated before the
assignment to the LHS is done. This might even imply the generation of a
temporary variable. By contrast, DO CONCURRENT is much better: The user
guarantees that the is no order dependence, while constraints ensure that the
many violations of this are detected at compile time. In addition, DO
CONCURRENT permits more things within its body.

By the way, the Fortran committee is considering to deprecate FORALL in the
next standard (Fortran 2015) because it considers FORALL superior in nearly all
aspects.

For DO CONCURRENT, I have a pending patch which sets the vectorization safelen
to infinity (well INT_MAX). I wonder whether one could do likewise for FORALL;
that probably needs some dependency fine tuning to ensure that there is
dependency at all between the LHS and RHS. To avoid temporaries, it is
sufficient to be either forward or backward dependency free.

(Setting the safelen for whole-array operations probably also makes sense.
There, the same applies.)



(In reply to Lionel GUEZ from comment #6)
 There is also the problem of the order of indices in a forall. I guess this
 is in close relation to the comparison of do and forall.

Try compiling with -floop-interchange (requires a GCC built with Graphite).

Deciding which order is best is not a trivial task, although in simple cases as
yours, it shouldn't be that difficult. Maybe someone finds the time to do it.

[Presumably the same issue comes up with DO CONCURRENT, if one places multiple
iteration variables into that statement (opposed to using multiple DO
CONCURRENT statements with one iteration variable).]

 According to the Fortran standard, the order of indices in the forall header
 is of no consequence.

Well, it doesn't with any of the compilers: The resulting value is always the
same. The standard doesn't tell anything about the performance (not about the
index walking order).


[Bug fortran/42118] Slow forall

2013-10-08 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

--- Comment #8 from Tobias Burnus burnus at gcc dot gnu.org ---
(In reply to Tobias Burnus from comment #7)
 By the way, the Fortran committee is considering to deprecate FORALL in the
 next standard (Fortran 2015) because it considers FORALL superior in nearly
 all aspects.

Change FORALL to DO CONCURRENT in the last line and deprecate to
obsolescent. See http://j3-fortran.org/doc/year/13/13-323.txt

The proposal has not been accepted yet, but I also didn't see much opposition
to it. Quoting the reasoning (proposed for Appendix B of the next Fortran
standard):

The FORALL construct and statement were added to the language
in the expectation that they would enable highly efficient
execution, especially on parallel processors.  However,
the experience with them indicates that they are too complex
and have too many restrictions for compilers to take
advantage of them.  They are redundant with the DO CONCURRENT
loop, and may of the manipulations for which they might be used
may be done more efficiently by use of pointers, especially
using pointer rank remapping.


[Bug fortran/42118] Slow forall

2012-03-01 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

Tobias Burnus burnus at gcc dot gnu.org changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org

--- Comment #4 from Tobias Burnus burnus at gcc dot gnu.org 2012-03-01 
08:06:01 UTC ---
(In reply to comment #3)
 Also exist in the gcc4.7 trunk. Can we mark it a Regression?

Only if it worked better in some previous GCC version, which does not seem to
be the case.


Additionally, as written before (comment 2), a reasonably well written DO loop
should be always as fast or faster than a FORALL. The definition of FORALL does
not allow for a good optimization in the general case. You should also consider
using Fortran 2008's DO CONCURRENT, which allows for more optimizations than a
normal DO loop. (Though, currently gfortran handles DO CONCURRENT as a normal
DO loop.)

As FORALL is rather complicated and not widely used, some possible
optimizations aren't implemented. (I have not checked whether that's the case
for the program in question.)

I did a quick run with six compilers. Result: The FORALL construct was between
3.2 to 5.25 times slower than the DO loop. Thus, other compilers do not handle
it better, either.


[Bug fortran/42118] Slow forall

2012-03-01 Thread anlauf at gmx dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

Harald Anlauf anlauf at gmx dot de changed:

   What|Removed |Added

 CC||anlauf at gmx dot de

--- Comment #5 from Harald Anlauf anlauf at gmx dot de 2012-03-01 19:54:08 
UTC ---
(In reply to comment #4)
 Additionally, as written before (comment 2), a reasonably well written DO loop
 should be always as fast or faster than a FORALL. The definition of FORALL 
 does
 not allow for a good optimization in the general case.

Do not forget that there are constraints for FORALL statements that are
not required for DO loops so that all assignments are independent.
This guarantees vectorization

 I did a quick run with six compilers. Result: The FORALL construct was between
 3.2 to 5.25 times slower than the DO loop. Thus, other compilers do not handle
 it better, either.

I tried the SunStudio 12 on i686

 Time of operation was  11.831321  seconds
 Time of operation was  12.235342  seconds

and on x86_64 (AMD barcelona)

 Time of operation was  8.715117  seconds
 Time of operation was  10.525522  seconds

So a small slowdown.

Then I tried NEC's sxf90 rev.441 for SX-9 at -Chopt:

 Time of operation was   4.187261   seconds
 Time of operation was   1.259775   seconds

Whoops!  After looking into the transformation listing and instrumenting
the code, it looks like the do loop is poorly optimized, giving lots
of so-called bank conflicts.

Reducing optimization to -Cvopt, I get:

 Time of operation was   1.185673   seconds
 Time of operation was   1.271729   seconds

Looks reasonable.

So yes, FORALL is in practice slightly slower (almost always... ;-)


[Bug fortran/42118] Slow forall

2012-02-29 Thread xunxun1982 at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118

xunxun xunxun1982 at gmail dot com changed:

   What|Removed |Added

 CC||xunxun1982 at gmail dot com

--- Comment #3 from xunxun xunxun1982 at gmail dot com 2012-03-01 07:37:08 
UTC ---
Also exist in the gcc4.7 trunk.
Can we mark it a Regression?


[Bug fortran/42118] Slow forall

2009-11-20 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2009-11-20 14:03 ---
Confirmed.  GFortran seems to split the loops differently and uses a larger
temporary for the forall case.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-11-20 14:03:56
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118



[Bug fortran/42118] Slow forall

2009-11-20 Thread burnus at gcc dot gnu dot org


--- Comment #2 from burnus at gcc dot gnu dot org  2009-11-20 14:20 ---
(In reply to comment #0)
 I think that ‘forall’ statement must be at least as fast as equivalent
 ‘do-…-end do’ construction.

The Fortran standardization committee thought likewise, however, as it turned
out in practice, it is sometimes not trivial for the compiler to see whether
there is any dependence on the RHS (right-hand side) with regards to the LHS
and thus it might use a temporary array even if none is needed - and temporary
arrays are slow (and memory hungry).

Thus, a DO loop should be always faster or as fast as a FORALL (assignment)
statement (unless, one does something really stupid in the DO loop).

[At least that is what I gathered from the comments at comp.lang.fortran and
which matches my knowledge regarding how it is done in gfortran.]

Having said that, gfortran still should try to make your program as fast for
FORALL as it is for the DO loop.

 But the next program (variant of LU-decomposition) shows that fragment
 containing ‘forall’ statement is approximately at 2.5(!) times slower then
 fragment with ‘do-end do’.

You could check using  -fdump-tree-original  how the two versions are handled;
my guess is that the FORALL version uses a temporary array.
(-fdump-tree-original  creates a file.f90.004* which contains a dump of the
internal representation of your code, which looks similar to C.)

Seemingly, Richard already looked at the dump and confirmed my suspicion.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118