[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-21 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #23 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-02-21 12:53:30 UTC ---
(In reply to comment #22)
 What is the performance with 4.3 -O2?  

4.3:
 gfortran -O2 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out
Time for evaluation [s]:4.373

4.6:
  gfortran -O2 -march=native -funroll-loops -ffast-math test.f90 ; ./a.out
Time for evaluation [s]:4.347

so, same performance. 

Given that vectorization only happens at -O3, it is an important optimization
level for numerical codes. Nevertheless, I would propose to remove the
regression tag, and instead refocus the bug on the what current trunk does at
-O3 vs -O2 -ftree-vectorize as noted in comment #21

 gfortran -O2 -march=native -funroll-loops  -ffast-math  -ftree-vectorize 
 test.f90 ; ./a.out
Time for evaluation [s]:2.694

 gfortran -O3 -march=native -funroll-loops  -ffast-math  -ftree-vectorize 
 test.f90 ; ./a.out
Time for evaluation [s]:4.536


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-20 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |WAITING
 CC||steven at gcc dot gnu.org

--- Comment #18 from Steven Bosscher steven at gcc dot gnu.org 2011-02-20 
15:22:26 UTC ---
Hello Joost, could you please check if this is still a problem in GCC 4.6?


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-20 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #19 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-02-20 16:17:33 UTC ---
(In reply to comment #18)
 Hello Joost, could you please check if this is still a problem in GCC 4.6?

I think it still is a minor problem, but (without   -fschedule-insns) somewhat
less pronounced (the old hardware is gone, this might make a difference):

4.3 branch

 gfortran -O3 -march=native -funroll-loops  -ffast-math   -fschedule-insns 
 test.f90 ; ./a.out 
Time for evaluation [s]:3.478
 gfortran -O3 -march=native -funroll-loops  -ffast-math   test.f90 ; ./a.out 
Time for evaluation [s]:4.367

4.5 branch

 gfortran -O3 -march=native -funroll-loops  -ffast-math   -fschedule-insns 
 test.f90 ; ./a.out 
Time for evaluation [s]:4.839
 gfortran -O3 -march=native -funroll-loops  -ffast-math  test.f90 ; ./a.out 
Time for evaluation [s]:4.524

4.6 branch
 gfortran -O3 -march=native -funroll-loops  -ffast-math   -fschedule-insns 
 test.f90 ; ./a.out 
Time for evaluation [s]:4.997
 gfortran -O3 -march=native -funroll-loops  -ffast-math   test.f90 ; ./a.out 
Time for evaluation [s]:4.547

FYI: -march=amdfam10 -mcx16 -msahf -mpopcnt -mabm
model name  : AMD Opteron(tm) Processor 6176 SE


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-20 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #20 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-02-20 16:28:00 UTC ---
additionally for trunk, lto/profile-use seem not to help:

 gfortran -O3 -march=native -funroll-loops  -ffast-math  -flto -fprofile-use 
 test.f90 ; ./a.out 
Time for evaluation [s]:4.664

 gfortran -O3 -march=native -funroll-loops  -ffast-math   -fprofile-use 
 test.f90 ; ./a.out 
Time for evaluation [s]:4.665


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-20 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #21 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-02-20 16:32:38 UTC ---
... however, the following works great:

 gfortran -O2 -march=native -funroll-loops  -ffast-math  -ftree-vectorize 
 test.f90 ; ./a.out 
Time for evaluation [s]:2.700

(notice -O2 instead of -O3, -O2 is thus twice as fast as -O3)


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-20 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |NEW


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2011-02-20 Thread bonzini at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

--- Comment #22 from Paolo Bonzini bonzini at gnu dot org 2011-02-21 07:55:35 
UTC ---
What is the performance with 4.3 -O2?  A regression that is limited to -O3 is
(a bit) less important since -O3 is still a mixing bag of optimizations that
might or might not be proficient.


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2010-10-01 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.4.5   |4.4.6


[Bug target/38306] [4.4/4.5/4.6 Regression] 15% slowdown w.r.t. 4.3 of computational kernel on some architectures

2010-04-30 Thread jakub at gcc dot gnu dot org


-- 

jakub at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.4.4   |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306