[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-10-28 Thread ubizjak at gmail dot com
--- Comment #11 from ubizjak at gmail dot com 2009-10-28 10:33 --- Author: revitale Date: Tue Oct 27 11:46:07 2009 New Revision: 153590 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=153590 Log: Fix PR40648 -- Fix misaligned store vectorizer patch Modified:

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-10-28 Thread ubizjak at gmail dot com
--- Comment #12 from ubizjak at gmail dot com 2009-10-28 10:36 --- The patch fixed the regression, see test_fpu chart [1] between 2009-10-27 and 2009-10-28. [1] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html -- ubizjak at gmail dot com changed:

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-10-25 Thread eres at il dot ibm dot com
--- Comment #10 from eres at il dot ibm dot com 2009-10-25 12:41 --- (In reply to comment #0) Hello! The [patch, vectorizer] misaligned store support patch [1] resulted in more than 10% longer execution time for Polyhedron test_fpu test on Core2. The test is compiled with

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-09 Thread eres at il dot ibm dot com
--- Comment #9 from eres at il dot ibm dot com 2009-07-09 07:32 --- Not using unaligned stores for this kind of data dependence or peeling for alignment will probably help here. The decision of how to vectorized can be changed for x86 (or any other target). Instead of first checking

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-07 Thread rguenth at gcc dot gnu dot org
--- Comment #8 from rguenth at gcc dot gnu dot org 2009-07-07 15:47 --- The issue is likely the sequence load upper half of cache line 1 load lower half of cache line 2 store upper half of cache line 1 store lower half of cache line 2 --- load upper half of cache line 2

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-05 Thread eres at il dot ibm dot com
--- Comment #7 from eres at il dot ibm dot com 2009-07-05 08:12 --- Testing test_fpu on Power7 with the power7 branch shows no significant difference between the version compiled with the misaligned store support patch and without it. (using -mcpu=power7 -ffast-math -funroll-loops -O3)

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread rguenth at gcc dot gnu dot org
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-04 12:05 --- Can you check numbers with vectorization disabled? I see the regression as well on a AMD Fam 10 machine which supposedly has unaligned moves as fast as aligned moves (if the data turns out to be aligned). Which

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread rguenth at gcc dot gnu dot org
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-07-04 12:33 --- One loop is 139 0.0046 : DO l = 1 , K 622 0.0208 : IF ( B(l,j)/=ZERO ) THEN : temp = Alpha*B(l,j) 21380 0.7146 : DO i

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread rguenth at gcc dot gnu dot org
--- Comment #3 from rguenth at gcc dot gnu dot org 2009-07-04 12:36 --- Tuned for Core2 I get for the innermost loop .L19: leal(%eax,%ebx), %edx movsd (%eax,%ecx), %xmm1 movsd (%edx), %xmm7 movhpd 8(%eax,%ecx), %xmm1 movhpd 8(%edx),

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread ubizjak at gmail dot com
--- Comment #4 from ubizjak at gmail dot com 2009-07-04 12:43 --- (In reply to comment #1) Can you check numbers with vectorization disabled? I see the regression as well on a AMD Fam 10 machine which supposedly has unaligned moves as fast as aligned moves (if the data turns out to

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread ubizjak at gmail dot com
--- Comment #5 from ubizjak at gmail dot com 2009-07-04 13:40 --- (In reply to comment #4) and in regressed case: ... in NON-regressed case. The regressed code is the first dump. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread dominiq at lps dot ens dot fr
--- Comment #6 from dominiq at lps dot ens dot fr 2009-07-04 14:02 --- I have seen this problem also. From a crude profiling, it seems that the slow routines are dgemm as pointed in comment #2 and gauss. This is a regression with respect to 4.4.0 and it has started between June 5 and 6.