This is on an AMD Athlon(tm) 64 X2 Dual Core Processor 4800+  (using openSUSE
Factory in x86-64 mode).

When compiling the Polyhedron "induct.f90" test case with and without
vectorization, the run time with vectorization is 30% longer. I think the
vectorization cost model needs to be tuned for this processor. (By comparison,
with a Core2Duo, the run time doubles without vectorization.)

gfortran -march=native -ffast-math -O3 -ftree-vectorize -fvect-cost-model
induct.f90
user    0m35.626s

gfortran -march=opteron -ffast-math -funroll-loops -ftree-vectorize
-ftree-loop-linear -msse3 -O3 induct.f90; time ./a.out
real    0m36.676s, user    0m36.390s

gfortran -march=opteron -ffast-math -funroll-loops -fno-tree-vectorize
-ftree-loop-linear -msse3 -O3 induct.f90; time ./a.out
real    0m28.000s, user    0m27.830s

(If you don't have the benchmark, it is available from
http://www.polyhedron.co.uk/MFL6VW74649 )


The problem was detected when applying the patch
http://gcc.gnu.org/ml/fortran/2009-08/msg00208.html. With that patch one has

induct.f90:5062: note: LOOP VECTORIZED.
induct.f90:5061: note: LOOP VECTORIZED.
induct.f90:5060: note: LOOP VECTORIZED.
induct.f90:5059: note: LOOP VECTORIZED.
induct.f90:5058: note: LOOP VECTORIZED.
induct.f90:5057: note: LOOP VECTORIZED.
induct.f90:4893: note: LOOP VECTORIZED.

and without the patch (and 30% slower):

induct.f90:1772: note: LOOP VECTORIZED.
induct.f90:1660: note: LOOP VECTORIZED.
induct.f90:2220: note: LOOP VECTORIZED.
induct.f90:2077: note: LOOP VECTORIZED.
induct.f90:3060: note: LOOP VECTORIZED.
induct.f90:2918: note: LOOP VECTORIZED.
induct.f90:2724: note: LOOP VECTORIZED.
induct.f90:2582: note: LOOP VECTORIZED.
induct.f90:5062: note: LOOP VECTORIZED.
induct.f90:5061: note: LOOP VECTORIZED.
induct.f90:5060: note: LOOP VECTORIZED.
induct.f90:5059: note: LOOP VECTORIZED.
induct.f90:5058: note: LOOP VECTORIZED.
induct.f90:5057: note: LOOP VECTORIZED.
induct.f90:4893: note: LOOP VECTORIZED.


-- 
           Summary: Tree-vectorizer: VecCost tuning for X2: Without
                    vectorization 30% faster
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: burnus at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41115

Reply via email to