[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139 Dominique d'Humieres changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #9 from Dominique d'Humieres --- > The change occured between 4.5 and 4.6 (note that 4.6 and 4.7 gives > 0.263675928 without -funroll-loops). Is this still an issue? No feedback for over two years. Closing as FIXED.
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139 Dominique d'Humieres dominiq at lps dot ens.fr changed: What|Removed |Added Status|NEW |WAITING --- Comment #8 from Dominique d'Humieres dominiq at lps dot ens.fr --- I have not tested this with latest trunk, but I wonder if any of the recent optimization work has improved this. Can it be closed yet? A quick test on a 2.5Ghz Core2Duo at revision 200321 with -Ofast shows Fortran: 0.330040932 c_sse: 0.225150943 c_struct: 0.227035046 and with -Ofast -funroll-loops Fortran: 0.213014960 c_sse: 0.223238945 c_struct: 0.209081888 The change occured between 4.5 and 4.6 (note that 4.6 and 4.7 gives 0.263675928 without -funroll-loops). Is this still an issue?
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #7 from jvdelisle at gcc dot gnu dot org 2010-09-12 15:45 --- I have not tested this with latest trunk, but I wonder if any of the recent optimization work has improved this. Can it be closed yet? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
-- fxcoudert at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-04-18 06:56:03 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #5 from burnus at gcc dot gnu dot org 2007-03-12 07:58 --- complex * complex is not a simple cross product in FP world. Well, the program calculates: real * complex -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #6 from burnus at gcc dot gnu dot org 2007-03-12 08:16 --- Can someone try instead of doing __real__ a += w[j] *__real__ mfi[*index]; Use a+= xxx* yyy and also use -std=c99 to get the correct multiplication? Well, -std=c99 was used already and the real(!) * complex calculation was already correct. c_cmplx below uses now: a += w[j] * mfi[*index++]; Compiled with: gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math -m64 gfortran -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math -m64 Fortran: 0.4360271 Fortran: 0.4280267 c_nosse: 0.2440166 c_nosse: 0.2320151 c_sse: 0.2320137 c_sse: 0.2400150 c_struct: 0.2320151 c_struct: 0.2320147 c_cmplx: 0.2360163 c_cmplx: 0.2320147 And using a non-manually unrolled version: 0.3760242, 0.3760242 for(i = 0; i np ; i++) { for(j = 1; j n; j++) a += w[j] * mfi[*index++]; fo[i] = a; } Thus the unrolling seems to do most of the speed up. With -funroll-all-loops, the timings of fortran an the non-unrolled version remain the same. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #1 from burnus at gcc dot gnu dot org 2007-03-11 22:45 --- Contains the test case. The hand-made SSE version (USE_VECTORS) crashes here for -m32, but as it is C vs. Fortran, one can completely ignore that test case (for -m64 USE_VECTORS is about as fast as the other C versions anyhow). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #2 from burnus at gcc dot gnu dot org 2007-03-11 22:50 --- Created an attachment (id=13191) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13191action=view) test.tar.gz -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #3 from pinskia at gcc dot gnu dot org 2007-03-12 05:33 --- The problem here is obvious, in the Fortran case, there is a temp array being created while in the C case, there is not. Also in the optimized C case, the multiplication of the complex numbers is incorrect unless you add -ffast-math. Actually I think in both C cases it is incorrect. Can someone try with -ffast-math for both the C and Fortran cases? complex * complex is not a simple cross product in FP world. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
[Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
--- Comment #4 from pinskia at gcc dot gnu dot org 2007-03-12 05:37 --- The hand-made SSE version (USE_VECTORS) crashes here for -m32 Because well complex(8)'s alignment is the same as double which means it is only 8 byte aligned and not 16 byte aligned, it is just magical that the SSE case works for -m64 also. I am thinking about declaring this bug as invalid as right now the C testcase is not even closely related to the Fortran case. Can someone try instead of doing __real__ a += w[j] * __real__ mfi[*index ]; Use a+= xxx* yyy and also use -std=c99 to get the correct multiplication? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139