[Bug fortran/40766] this fortran program is too slow

2012-04-24 Thread joseph at codesourcery dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766

--- Comment #23 from joseph at codesourcery dot com joseph at codesourcery dot 
com 2012-04-24 13:13:13 UTC ---
The glibc libm work has mainly been oriented at correctness rather than 
performance, and postdates the 2.15 release so will be new in 2.16 (the 
2.15 announcement came some time after the actual tag and branching).


[Bug fortran/40766] this fortran program is too slow

2012-04-19 Thread jb at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766

Janne Blomqvist jb at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jb at gcc dot gnu.org

--- Comment #22 from Janne Blomqvist jb at gcc dot gnu.org 2012-04-19 
14:34:35 UTC ---
AFAIK the recently released Glibc 2.15 incorporates quite a lot of work in
libm. Whether it fixes any of these performance issues I don't know.


[Bug fortran/40766] this fortran program is too slow

2011-07-24 Thread dfranke at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766

--- Comment #21 from Daniel Franke dfranke at gcc dot gnu.org 2011-07-24 
18:49:19 UTC ---
One year down. Did anything happen here?


[Bug fortran/40766] this fortran program is too slow

2010-05-10 Thread maxim at codesourcery dot com


--- Comment #20 from mkuvyrkov at gcc dot gnu dot org  2010-05-10 10:46 
---
Subject: Re:  this fortran program is too slow

On 5/7/10 1:38 AM, steven at gcc dot gnu dot org wrote:
 --- Comment #19 from steven at gcc dot gnu dot org  2010-05-06 21:38 
 ---
 One possibility is to see if the glibc patches for this issue can be merged
 into eglibc... Maxim what do you think?

I'll look into this when I have a minute.

I'm hesitant to merging patches to EGLIBC that were not submitted to 
either GLIBC or EGLIBC mailing lists.  There are copyright assignment 
issues with extracting patches from (open)SUSE's GLIBC and committing 
them in to EGLIBC.  Copyright assignment is not an absolutely blocking 
issue, but it is one of the concerns.

The plan of action is to find out who the author of the patch is and ask 
him or her to submit the patch to EGLIBC.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2010-05-06 Thread dfranke at gcc dot gnu dot org


--- Comment #18 from dfranke at gcc dot gnu dot org  2010-05-06 19:23 
---
(In reply to comment #16)
 This is a glibc issue with software sin function.

Is there anything that we can do about this?
If not, this PR should be closed.


-- 

dfranke at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||dfranke at gcc dot gnu dot
   ||org
 Status|NEW |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2010-05-06 Thread steven at gcc dot gnu dot org


--- Comment #19 from steven at gcc dot gnu dot org  2010-05-06 21:38 ---
One possibility is to see if the glibc patches for this issue can be merged
into eglibc... Maxim what do you think?


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||mkuvyrkov at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-12-05 Thread burnus at gcc dot gnu dot org


--- Comment #17 from burnus at gcc dot gnu dot org  2009-12-05 19:01 ---
(In reply to comment #16)
 This is a glibc issue with software sin function.

AMD has some patches for this, which are seemingly only used by (open)SUSE's
glibc. Try http://developer.amd.com/CPU/LIBRARIES/LIBM/Pages/default.aspx
(The source can be found in the repository of Open64.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-12-04 Thread jvdelisle at gcc dot gnu dot org


--- Comment #16 from jvdelisle at gcc dot gnu dot org  2009-12-05 06:29 
---
This is a glibc issue with software sin function.  It does not use the FPU.

Just try with -m32. Changing n=5
$ gfc -m64 untitled.f90 
$ time ./a.out
  -1781878.9

real0m3.060s
user0m3.050s
sys 0m0.003s
$ gfc -m32 untitled.f90 
$ time ./a.out
  -1781888.9

real0m0.234s
user0m0.231s
sys 0m0.004s
$ 

The situation is absolutely absurd.  I opened a PR for this so long ago, I
don't remember the number. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-25 Thread linuxl4 at sohu dot com


--- Comment #15 from linuxl4 at sohu dot com  2009-07-25 07:40 ---
no , I wrote this source myself.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-22 Thread eres at il dot ibm dot com


--- Comment #14 from eres at il dot ibm dot com  2009-07-22 11:15 ---
(In reply to comment #0)
 program main
   implicit none
   integer :: i,j
   integer,parameter :: N=5000
   real :: x(N)=0.0
   do j=1,20
   do i=1,N
 x(i)=x(i)+sin(real(i))+cos(real(i))-tan(real(i))
   enddo
   enddo
   print *, sum(x)
 end program main

Is this exmaple taken from a specific benchmark?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-16 Thread ubizjak at gmail dot com


--- Comment #10 from ubizjak at gmail dot com  2009-07-16 06:56 ---
(In reply to comment #6)

 Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a
 slowdown of 25%, which is still acceptable. Why the Intel routines are so slow
 on my AMD, I do not know.

See [1], section 12.1, CPU dispatching in Intel compiler, on how to hack around
this issue.

[1] http://www.agner.org/optimize/optimizing_cpp.pdf


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-16 Thread ubizjak at gmail dot com


--- Comment #11 from ubizjak at gmail dot com  2009-07-16 07:16 ---
(In reply to comment #6)

 Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor 
 for
 ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called?

Because sincos returns _TWO_ values and the vectorizer does not yet support
this. ASAP as the middle-end infrastructure is in place, we can stick
vectorized sincos in ix86_veclib* functions. See also [1] and [2], sincos part.

Perhaps you could motivate Richi to extend the vectorizer infrastructure ;)

[1]
http://software.intel.com/en-us/articles/implement-the-short-vector-math-library/
[2]
http://developer.amd.com/cpu/Libraries/acml/onlinehelp/Documents/Vector.html#Vector


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-16 Thread rguenth at gcc dot gnu dot org


--- Comment #12 from rguenth at gcc dot gnu dot org  2009-07-16 09:06 
---
Actually the middle-end presents the vectorizer with a call to a complex
function and REAL/IMAGPART exprs.  I don't remember exactly which part
confuses it, but certainly the mixed complex / real types do.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-16 Thread ubizjak at gmail dot com


-- 

ubizjak at gmail dot com changed:

   What|Removed |Added

  BugsThisDependsOn||40770
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-07-16 10:06:11
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-16 Thread burnus at gcc dot gnu dot org


--- Comment #13 from burnus at gcc dot gnu dot org  2009-07-16 09:43 ---
See PR 40770 for Vectorization of complex types, vectorization of sincos
missing


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread linuxl4 at sohu dot com


--- Comment #1 from linuxl4 at sohu dot com  2009-07-15 15:49 ---
My server is an atom330/gentoo

gfortran -v
GNU Fortran (GCC) 4.5.0 20090715 (experimental)
Copyright (C) 2009 Free Software Foundation, Inc.

gfortran 1.f90; time ./a.out
  4.28173363E+09

real120m30.599s
user120m29.164s
sys 0m0.464s


ifort 1.f90; time ./a.out
  4.3692155E+09

real2m56.217s
user2m55.871s
sys 0m0.352s

if I call the functions(sin,cos,tan) from intel's libimf.so, then
gfortran 1.f90 -limf
  4.31716608E+09

real6m39.177s
user6m38.289s
sys 0m0.512s


-- 

linuxl4 at sohu dot com changed:

   What|Removed |Added

Summary|this fortran program is too |this fortran program is too
   |slow|slow


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread pinskia at gcc dot gnu dot org


--- Comment #2 from pinskia at gcc dot gnu dot org  2009-07-15 15:55 ---
What is the timing when adding -O3 to the command line.  GCC defaults to no
optimizations turned on.  This is unlike ifort which defaults to having
optimizations turned on.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread ubizjak at gmail dot com


--- Comment #3 from ubizjak at gmail dot com  2009-07-15 17:58 ---
(In reply to comment #1)

 if I call the functions(sin,cos,tan) from intel's libimf.so, then
 gfortran 1.f90 -limf
   4.31716608E+09
 
 real6m39.177s
 user6m38.289s
 sys 0m0.512s


This is probably library issue.

You can try to benchmark with -O3 -mfpmath=sse,387 -ffast-math
(Alternatively, you can link svml vector library with -O3 -mveclibabi=svml
-ffast-math, although IIRC, vectorized sincos is not yet supported.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread linuxl4 at sohu dot com


--- Comment #4 from linuxl4 at sohu dot com  2009-07-15 18:35 ---
-O3 also very slow.

  4.28173363E+09

real81m50.845s
user81m50.587s
sys 0m0.444s

can anybody confirm?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread dominiq at lps dot ens dot fr


--- Comment #5 from dominiq at lps dot ens dot fr  2009-07-15 18:50 ---
 can anybody confirm?

On a 2.1Ghz core2duo, i686-apple-darwin, I get:

[ibook-dhum] bug/timing% gfc -m64 -O3 -ffast-math pr40766_db.f90
[ibook-dhum] bug/timing% time a.out
  4.36921651E+09
157.568u 0.454s 2:38.39 99.7%   0+0k 0+0io 27pf+0w

[ibook-dhum] bug/timing% gfc -m64 -O3 -mfpmath=sse,387 -ffast-math
pr40766_db.f90
[ibook-dhum] bug/timing% time a.out
  6.78342144E+08
127.528u 0.411s 2:08.08 99.8%   0+0k 0+0io 0pf+0w

[ibook-dhum] bug/timing% time a.out
  4.3692155E+09
31.441u 0.288s 0:31.79 99.7%0+0k 0+0io 1pf+0w

So depending on the options, only a factor 4 to 5.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread burnus at gcc dot gnu dot org


--- Comment #6 from burnus at gcc dot gnu dot org  2009-07-15 20:27 ---
You should also add -march=native to the command line; it probably does not
help much, bit it should help a bit. I recall also the standard GLIBC misses
some optimized version for math on x86-64 while AMD provides patches for those
(applied by standard on SUSE Linux). Though, I am not sure whether this is
still an issue.

With openSUSE Factory (x86_64, glibc 2.10.1, GCC 4.5.0) I get on an AMD Athlon
64 x2 4800+ the following timings, which do not look too bad:

$ ifort -O3 -xHost aa.f90; time ./a.out/
real  1m59.997suser  1m59.651s   sys   0m0.252s

$ gfortran -O3 -ffast-math -march=native aa.f90; time ./a.out
real  2m29.711suser  2m28.841s   sys   0m0.236s

$ gfortran -O3 -ffast-math  -mveclibabi=acml -march=native aa.f90 \
  -L /opt/acml4.2.0/gfortran64_mp/lib/ -lacml_mv   #(Note: current is ACML 4.3)
real  2m29.693suser  2m29.373s   sys   0m0.192s

$ gfortran -O3 -ffast-math  -mveclibabi=svml -march=native aa.f90 \
  -L /opt/intel/Compiler/11.1/038/lib/intel64 -lsvml -limf -lintlc; \
  time ./a.out
real  3m56.189suser  3m55.839s   sys   0m0.200s

Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a
slowdown of 25%, which is still acceptable. Why the Intel routines are so slow
on my AMD, I do not know.

With -mveclibabi=svml sincosf and tanf are linked; for -mveclibabi=acml and no
-mvec* option, sincosf and tanf@@GLIBC_2.2.5. ifort by contrast calls:
vmlsSinCos4 vmlsTan4

Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor for
ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread rguenth at gcc dot gnu dot org


--- Comment #7 from rguenth at gcc dot gnu dot org  2009-07-15 21:00 ---
icc can vectorize the function, gcc cannot.  Use an operating system which
has sincos available and you'll get at least that bit.

You definitely want -O3 -ffast-math.  That we can't vectorize sin/cos/tan
is RMS fault.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread linuxl4 at sohu dot com


--- Comment #8 from linuxl4 at sohu dot com  2009-07-16 04:37 ---
compilation is also very slow, isn't it?

can anybody confirm my results of only with or without -O3 option?

I think the difference of sse or x87 is 4 times at most.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766



[Bug fortran/40766] this fortran program is too slow

2009-07-15 Thread kargl at gcc dot gnu dot org


--- Comment #9 from kargl at gcc dot gnu dot org  2009-07-16 05:06 ---
(In reply to comment #8)
 compilation is also very slow, isn't it?
 

It's due to the initialization expression.
How much memory do you have?  You're most likely swapping.
Your code when compiled with 4.5.0 shows 

  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
 2092 kargl 1  980  1040M   807M CPU10   0:07 37.98% f951

in top(1).  

Changing your code to something a little more sane like

  integer,parameter :: N=5000
  real :: x(N)
  x = 0.0

uses no swap and compiles in less than a second.

If you reduce 5000 to something sane like 50 and use
the -fdump-tree-original option you might get a clue to
the problem.





-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766