--- Comment #11 from ubizjak at gmail dot com 2009-10-28 10:33 ---
Author: revitale
Date: Tue Oct 27 11:46:07 2009
New Revision: 153590
URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=153590
Log:
Fix PR40648 -- Fix misaligned store vectorizer patch
Modified:
--- Comment #12 from ubizjak at gmail dot com 2009-10-28 10:36 ---
The patch fixed the regression, see test_fpu chart [1] between
2009-10-27 and 2009-10-28.
[1] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html
--
ubizjak at gmail dot com changed:
--- Comment #10 from eres at il dot ibm dot com 2009-10-25 12:41 ---
(In reply to comment #0)
Hello!
The [patch, vectorizer] misaligned store support patch [1] resulted in more
than 10% longer execution time for Polyhedron test_fpu test on Core2.
The test is compiled with
--- Comment #9 from eres at il dot ibm dot com 2009-07-09 07:32 ---
Not using unaligned stores for this kind of data dependence or peeling
for alignment will probably help here.
The decision of how to vectorized can be changed for x86 (or any other target).
Instead of first checking
--- Comment #8 from rguenth at gcc dot gnu dot org 2009-07-07 15:47 ---
The issue is likely the sequence
load upper half of cache line 1
load lower half of cache line 2
store upper half of cache line 1
store lower half of cache line 2 ---
load upper half of cache line 2
--- Comment #7 from eres at il dot ibm dot com 2009-07-05 08:12 ---
Testing test_fpu on Power7 with the power7 branch shows no significant
difference between the version compiled with the misaligned store support patch
and without it. (using -mcpu=power7 -ffast-math -funroll-loops -O3)
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-04 12:05 ---
Can you check numbers with vectorization disabled? I see the regression as
well on a AMD Fam 10 machine which supposedly has unaligned moves as fast
as aligned moves (if the data turns out to be aligned). Which
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-07-04 12:33 ---
One loop is
139 0.0046 : DO l = 1 , K
622 0.0208 : IF ( B(l,j)/=ZERO ) THEN
: temp = Alpha*B(l,j)
21380 0.7146 : DO i
--- Comment #3 from rguenth at gcc dot gnu dot org 2009-07-04 12:36 ---
Tuned for Core2 I get for the innermost loop
.L19:
leal(%eax,%ebx), %edx
movsd (%eax,%ecx), %xmm1
movsd (%edx), %xmm7
movhpd 8(%eax,%ecx), %xmm1
movhpd 8(%edx),
--- Comment #4 from ubizjak at gmail dot com 2009-07-04 12:43 ---
(In reply to comment #1)
Can you check numbers with vectorization disabled? I see the regression as
well on a AMD Fam 10 machine which supposedly has unaligned moves as fast
as aligned moves (if the data turns out to
--- Comment #5 from ubizjak at gmail dot com 2009-07-04 13:40 ---
(In reply to comment #4)
and in regressed case:
... in NON-regressed case. The regressed code is the first dump.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648
--- Comment #6 from dominiq at lps dot ens dot fr 2009-07-04 14:02 ---
I have seen this problem also. From a crude profiling, it seems that the slow
routines are dgemm as pointed in comment #2 and gauss. This is a regression
with respect to 4.4.0 and it has started between June 5 and 6.
12 matches
Mail list logo