[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 --- Comment #23 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-02-21 12:53:30 UTC --- (In reply to comment #22) What is the performance with 4.3 -O2? 4.3: gfortran -O2 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out Time for evaluation [s]:4.373 4.6: gfortran -O2 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out Time for evaluation [s]:4.347 so, same performance. Given that vectorization only happens at -O3, it is an important optimization level for numerical codes. Nevertheless, I would propose to remove the regression tag, and instead refocus the bug on the what current trunk does at -O3 vs -O2 -ftree-vectorize as noted in comment #21 gfortran -O2 -march=native -funroll-loops -ffast-math -ftree-vectorize test.f90 ; ./a.out Time for evaluation [s]:2.694 gfortran -O3 -march=native -funroll-loops -ffast-math -ftree-vectorize test.f90 ; ./a.out Time for evaluation [s]:4.536
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Status|NEW |WAITING CC||steven at gcc dot gnu.org --- Comment #18 from Steven Bosscher steven at gcc dot gnu.org 2011-02-20 15:22:26 UTC --- Hello Joost, could you please check if this is still a problem in GCC 4.6?
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 --- Comment #19 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-02-20 16:17:33 UTC --- (In reply to comment #18) Hello Joost, could you please check if this is still a problem in GCC 4.6? I think it still is a minor problem, but (without -fschedule-insns) somewhat less pronounced (the old hardware is gone, this might make a difference): 4.3 branch gfortran -O3 -march=native -funroll-loops -ffast-math -fschedule-insns test.f90 ; ./a.out Time for evaluation [s]:3.478 gfortran -O3 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out Time for evaluation [s]:4.367 4.5 branch gfortran -O3 -march=native -funroll-loops -ffast-math -fschedule-insns test.f90 ; ./a.out Time for evaluation [s]:4.839 gfortran -O3 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out Time for evaluation [s]:4.524 4.6 branch gfortran -O3 -march=native -funroll-loops -ffast-math -fschedule-insns test.f90 ; ./a.out Time for evaluation [s]:4.997 gfortran -O3 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out Time for evaluation [s]:4.547 FYI: -march=amdfam10 -mcx16 -msahf -mpopcnt -mabm model name : AMD Opteron(tm) Processor 6176 SE
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 --- Comment #20 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-02-20 16:28:00 UTC --- additionally for trunk, lto/profile-use seem not to help: gfortran -O3 -march=native -funroll-loops -ffast-math -flto -fprofile-use test.f90 ; ./a.out Time for evaluation [s]:4.664 gfortran -O3 -march=native -funroll-loops -ffast-math -fprofile-use test.f90 ; ./a.out Time for evaluation [s]:4.665
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 --- Comment #21 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 2011-02-20 16:32:38 UTC --- ... however, the following works great: gfortran -O2 -march=native -funroll-loops -ffast-math -ftree-vectorize test.f90 ; ./a.out Time for evaluation [s]:2.700 (notice -O2 instead of -O3, -O2 is thus twice as fast as -O3)
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |NEW
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 --- Comment #22 from Paolo Bonzini bonzini at gnu dot org 2011-02-21 07:55:35 UTC --- What is the performance with 4.3 -O2? A regression that is limited to -O3 is (a bit) less important since -O3 is still a mixing bag of optimizations that might or might not be proficient.
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.4.5 |4.4.6
[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures
-- jakub at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.4.4 |4.4.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306