The problem came up with octopus, http://www.tddft.org/programs/octopus/ http://www.tddft.org/pipermail/octopus-devel/2007-March/003398.html (Somehow, all my messages are not archived?)
The problem is that "sum(w_re(1:nn,1)*fi(i(1:nn, ii)))" can be much slower. For the original program, one finds the following timings: Core 2 Duo + gfortan + gcc (v. 4.1.x), total cpu time: SSE2 120 s PLAIN C 160 s FORTRAN 331 s (Fortran = assumed shape arrays) * * * I tried to reproduce this with a smaller test case (see attachment) - and with explicit shape arrays. Here, SSE and non-SSE version made little difference. Result for gcc/gfortran 4.3.0 20070311 on an Athlon 64 X2 4800+. -O3 -march=opteron -funroll-loops -msse3 -ftree-vectorize -m64: Fortran: 0.8240519, real 0m7.661s, user 0m7.232s Fortran: 0.8240528, real 0m7.654s, user 0m7.232s c_nosse: 0.2320137, real 0m7.071s, user 0m6.652s c_nosse: 0.2320151, real 0m7.062s, user 0m6.672s -O3 -march=opteron -msse3 -ftree-vectorize -m32: Fortran: 0.3840241, real 0m7.714s, user 0m7.280s Fortran: 0.3840246, real 0m7.701s, user 0m7.328s c_nosse: 0.3480220, real 0m7.687s, user 0m7.256s c_nosse: 0.3400207, real 0m7.670s, user 0m7.236s And with ifort/x86-64: gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -m64 ifort -xW -O3 Fortran: 0.3280210, real 0m0.855s, user 0m0.624s Fortran: 0.3280210, real 0m0.856s, user 0m0.624s c_nosse: 0.2320140, real 0m0.753s, user 0m0.492s c_nosse: 0.2280150, real 0m0.756s, user 0m0.464s and with ifort/ia32: Fortran: 0.3000200, real 0m0.818s, user 0m0.516s Fortran: 0.2960190, real 0m0.826s, user 0m0.528s c_nosse: 0.3760230, real 0m0.904s, user 0m0.652s c_nosse: 0.3800240, real 0m0.902s, user 0m0.624s I did no yet check which of the problems are Fortran, Backend and Target problems. Summary: - GCC -m32 is much slower than -m64 - gfortran is slower (-m32) / much slower (-m64) than the C version - ifort is faster than gfortran and similarly fast on both -m32 and -m64. -- Summary: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139