[Bug fortran/40766] this fortran program is too slow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766 --- Comment #23 from joseph at codesourcery dot com joseph at codesourcery dot com 2012-04-24 13:13:13 UTC --- The glibc libm work has mainly been oriented at correctness rather than performance, and postdates the 2.15 release so will be new in 2.16 (the 2.15 announcement came some time after the actual tag and branching).
[Bug fortran/40766] this fortran program is too slow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766 Janne Blomqvist jb at gcc dot gnu.org changed: What|Removed |Added CC||jb at gcc dot gnu.org --- Comment #22 from Janne Blomqvist jb at gcc dot gnu.org 2012-04-19 14:34:35 UTC --- AFAIK the recently released Glibc 2.15 incorporates quite a lot of work in libm. Whether it fixes any of these performance issues I don't know.
[Bug fortran/40766] this fortran program is too slow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766 --- Comment #21 from Daniel Franke dfranke at gcc dot gnu.org 2011-07-24 18:49:19 UTC --- One year down. Did anything happen here?
[Bug fortran/40766] this fortran program is too slow
--- Comment #20 from mkuvyrkov at gcc dot gnu dot org 2010-05-10 10:46 --- Subject: Re: this fortran program is too slow On 5/7/10 1:38 AM, steven at gcc dot gnu dot org wrote: --- Comment #19 from steven at gcc dot gnu dot org 2010-05-06 21:38 --- One possibility is to see if the glibc patches for this issue can be merged into eglibc... Maxim what do you think? I'll look into this when I have a minute. I'm hesitant to merging patches to EGLIBC that were not submitted to either GLIBC or EGLIBC mailing lists. There are copyright assignment issues with extracting patches from (open)SUSE's GLIBC and committing them in to EGLIBC. Copyright assignment is not an absolutely blocking issue, but it is one of the concerns. The plan of action is to find out who the author of the patch is and ask him or her to submit the patch to EGLIBC. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #18 from dfranke at gcc dot gnu dot org 2010-05-06 19:23 --- (In reply to comment #16) This is a glibc issue with software sin function. Is there anything that we can do about this? If not, this PR should be closed. -- dfranke at gcc dot gnu dot org changed: What|Removed |Added CC||dfranke at gcc dot gnu dot ||org Status|NEW |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #19 from steven at gcc dot gnu dot org 2010-05-06 21:38 --- One possibility is to see if the glibc patches for this issue can be merged into eglibc... Maxim what do you think? -- steven at gcc dot gnu dot org changed: What|Removed |Added CC||mkuvyrkov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #17 from burnus at gcc dot gnu dot org 2009-12-05 19:01 --- (In reply to comment #16) This is a glibc issue with software sin function. AMD has some patches for this, which are seemingly only used by (open)SUSE's glibc. Try http://developer.amd.com/CPU/LIBRARIES/LIBM/Pages/default.aspx (The source can be found in the repository of Open64.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #16 from jvdelisle at gcc dot gnu dot org 2009-12-05 06:29 --- This is a glibc issue with software sin function. It does not use the FPU. Just try with -m32. Changing n=5 $ gfc -m64 untitled.f90 $ time ./a.out -1781878.9 real0m3.060s user0m3.050s sys 0m0.003s $ gfc -m32 untitled.f90 $ time ./a.out -1781888.9 real0m0.234s user0m0.231s sys 0m0.004s $ The situation is absolutely absurd. I opened a PR for this so long ago, I don't remember the number. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #15 from linuxl4 at sohu dot com 2009-07-25 07:40 --- no , I wrote this source myself. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #14 from eres at il dot ibm dot com 2009-07-22 11:15 --- (In reply to comment #0) program main implicit none integer :: i,j integer,parameter :: N=5000 real :: x(N)=0.0 do j=1,20 do i=1,N x(i)=x(i)+sin(real(i))+cos(real(i))-tan(real(i)) enddo enddo print *, sum(x) end program main Is this exmaple taken from a specific benchmark? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #10 from ubizjak at gmail dot com 2009-07-16 06:56 --- (In reply to comment #6) Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a slowdown of 25%, which is still acceptable. Why the Intel routines are so slow on my AMD, I do not know. See [1], section 12.1, CPU dispatching in Intel compiler, on how to hack around this issue. [1] http://www.agner.org/optimize/optimizing_cpp.pdf -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #11 from ubizjak at gmail dot com 2009-07-16 07:16 --- (In reply to comment #6) Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor for ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called? Because sincos returns _TWO_ values and the vectorizer does not yet support this. ASAP as the middle-end infrastructure is in place, we can stick vectorized sincos in ix86_veclib* functions. See also [1] and [2], sincos part. Perhaps you could motivate Richi to extend the vectorizer infrastructure ;) [1] http://software.intel.com/en-us/articles/implement-the-short-vector-math-library/ [2] http://developer.amd.com/cpu/Libraries/acml/onlinehelp/Documents/Vector.html#Vector -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #12 from rguenth at gcc dot gnu dot org 2009-07-16 09:06 --- Actually the middle-end presents the vectorizer with a call to a complex function and REAL/IMAGPART exprs. I don't remember exactly which part confuses it, but certainly the mixed complex / real types do. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
-- ubizjak at gmail dot com changed: What|Removed |Added BugsThisDependsOn||40770 Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-07-16 10:06:11 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #13 from burnus at gcc dot gnu dot org 2009-07-16 09:43 --- See PR 40770 for Vectorization of complex types, vectorization of sincos missing -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #1 from linuxl4 at sohu dot com 2009-07-15 15:49 --- My server is an atom330/gentoo gfortran -v GNU Fortran (GCC) 4.5.0 20090715 (experimental) Copyright (C) 2009 Free Software Foundation, Inc. gfortran 1.f90; time ./a.out 4.28173363E+09 real120m30.599s user120m29.164s sys 0m0.464s ifort 1.f90; time ./a.out 4.3692155E+09 real2m56.217s user2m55.871s sys 0m0.352s if I call the functions(sin,cos,tan) from intel's libimf.so, then gfortran 1.f90 -limf 4.31716608E+09 real6m39.177s user6m38.289s sys 0m0.512s -- linuxl4 at sohu dot com changed: What|Removed |Added Summary|this fortran program is too |this fortran program is too |slow|slow http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #2 from pinskia at gcc dot gnu dot org 2009-07-15 15:55 --- What is the timing when adding -O3 to the command line. GCC defaults to no optimizations turned on. This is unlike ifort which defaults to having optimizations turned on. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #3 from ubizjak at gmail dot com 2009-07-15 17:58 --- (In reply to comment #1) if I call the functions(sin,cos,tan) from intel's libimf.so, then gfortran 1.f90 -limf 4.31716608E+09 real6m39.177s user6m38.289s sys 0m0.512s This is probably library issue. You can try to benchmark with -O3 -mfpmath=sse,387 -ffast-math (Alternatively, you can link svml vector library with -O3 -mveclibabi=svml -ffast-math, although IIRC, vectorized sincos is not yet supported. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #4 from linuxl4 at sohu dot com 2009-07-15 18:35 --- -O3 also very slow. 4.28173363E+09 real81m50.845s user81m50.587s sys 0m0.444s can anybody confirm? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #5 from dominiq at lps dot ens dot fr 2009-07-15 18:50 --- can anybody confirm? On a 2.1Ghz core2duo, i686-apple-darwin, I get: [ibook-dhum] bug/timing% gfc -m64 -O3 -ffast-math pr40766_db.f90 [ibook-dhum] bug/timing% time a.out 4.36921651E+09 157.568u 0.454s 2:38.39 99.7% 0+0k 0+0io 27pf+0w [ibook-dhum] bug/timing% gfc -m64 -O3 -mfpmath=sse,387 -ffast-math pr40766_db.f90 [ibook-dhum] bug/timing% time a.out 6.78342144E+08 127.528u 0.411s 2:08.08 99.8% 0+0k 0+0io 0pf+0w [ibook-dhum] bug/timing% time a.out 4.3692155E+09 31.441u 0.288s 0:31.79 99.7%0+0k 0+0io 1pf+0w So depending on the options, only a factor 4 to 5. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #6 from burnus at gcc dot gnu dot org 2009-07-15 20:27 --- You should also add -march=native to the command line; it probably does not help much, bit it should help a bit. I recall also the standard GLIBC misses some optimized version for math on x86-64 while AMD provides patches for those (applied by standard on SUSE Linux). Though, I am not sure whether this is still an issue. With openSUSE Factory (x86_64, glibc 2.10.1, GCC 4.5.0) I get on an AMD Athlon 64 x2 4800+ the following timings, which do not look too bad: $ ifort -O3 -xHost aa.f90; time ./a.out/ real 1m59.997suser 1m59.651s sys 0m0.252s $ gfortran -O3 -ffast-math -march=native aa.f90; time ./a.out real 2m29.711suser 2m28.841s sys 0m0.236s $ gfortran -O3 -ffast-math -mveclibabi=acml -march=native aa.f90 \ -L /opt/acml4.2.0/gfortran64_mp/lib/ -lacml_mv #(Note: current is ACML 4.3) real 2m29.693suser 2m29.373s sys 0m0.192s $ gfortran -O3 -ffast-math -mveclibabi=svml -march=native aa.f90 \ -L /opt/intel/Compiler/11.1/038/lib/intel64 -lsvml -limf -lintlc; \ time ./a.out real 3m56.189suser 3m55.839s sys 0m0.200s Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a slowdown of 25%, which is still acceptable. Why the Intel routines are so slow on my AMD, I do not know. With -mveclibabi=svml sincosf and tanf are linked; for -mveclibabi=acml and no -mvec* option, sincosf and tanf@@GLIBC_2.2.5. ifort by contrast calls: vmlsSinCos4 vmlsTan4 Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor for ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #7 from rguenth at gcc dot gnu dot org 2009-07-15 21:00 --- icc can vectorize the function, gcc cannot. Use an operating system which has sincos available and you'll get at least that bit. You definitely want -O3 -ffast-math. That we can't vectorize sin/cos/tan is RMS fault. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #8 from linuxl4 at sohu dot com 2009-07-16 04:37 --- compilation is also very slow, isn't it? can anybody confirm my results of only with or without -O3 option? I think the difference of sse or x87 is 4 times at most. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766
[Bug fortran/40766] this fortran program is too slow
--- Comment #9 from kargl at gcc dot gnu dot org 2009-07-16 05:06 --- (In reply to comment #8) compilation is also very slow, isn't it? It's due to the initialization expression. How much memory do you have? You're most likely swapping. Your code when compiled with 4.5.0 shows PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 2092 kargl 1 980 1040M 807M CPU10 0:07 37.98% f951 in top(1). Changing your code to something a little more sane like integer,parameter :: N=5000 real :: x(N) x = 0.0 uses no swap and compiles in less than a second. If you reduce 5000 to something sane like 50 and use the -fdump-tree-original option you might get a clue to the problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766