[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #22 from rguenth at gcc dot gnu dot org 2009-07-03 14:11 --- Subject: Bug 34163 Author: rguenth Date: Fri Jul 3 14:11:14 2009 New Revision: 149207 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149207 Log: 2009-07-03 Richard Guenther PR middle-end/34163 * tree-chrec.c (chrec_convert_1): Fold (T2)(t +- x) to (T2)t +- (T2)x if t +- x is known to not overflow and the conversion widens the operation. * Makefile.in (tree-chrec.o): Add $(FLAGS_H) dependency. * gfortran.dg/pr34163.f90: New testcase. Added: trunk/gcc/testsuite/gfortran.dg/pr34163.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-chrec.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #21 from rguenth at gcc dot gnu dot org 2009-07-03 11:22 --- Created an attachment (id=18133) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18133&action=view) patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #20 from rguenth at gcc dot gnu dot org 2009-07-03 11:14 --- Before: Time for setup 0.139 Time per iteration 0.271 Total Time 6.649 Time for setup 0.136 Time per iteration 0.265 Total Time 10.210 Time for setup 0.134 Time per iteration 0.265 Total Time 7.276 Time for setup 0.134 Time per iteration 0.260 Total Time 11.572 After: Time for setup 0.114 Time per iteration 0.238 Total Time 5.834 Time for setup 0.111 Time per iteration 0.233 Total Time 8.948 Time for setup 0.110 Time per iteration 0.237 Total Time 6.504 Time for setup 0.112 Time per iteration 0.235 Total Time 10.454 which seems to exactly recover this regression. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #19 from rguenth at gcc dot gnu dot org 2009-07-03 11:05 --- In fact, in this case we have the C equivalent int i; long j = (long)(i - 1); vs. long j = (long)i - 1; which I believe are equivalent if overflow is undefined (or i - 1 does not wrap). It is just that fold obviously considers (long)i - 1 to be more expensive than (long)(i - 1) and thus does not transform the latter into the former (and it can't transform (long)i - 1 to (long)(i - 1) as if (long)i - 1 does not overflow there is no guarantee that i - 1 does not). We should be able to do the former transformation during SCEV analysis though. I have a patch which results in (-O3 -ffast-math -funroll-loops) .L6: mulss (%rcx), %xmm0 movss (%rdx), %xmm5 movss 4(%rdx), %xmm4 addl$4, %ebp subss %xmm0, %xmm5 movss 8(%rdx), %xmm0 mulss (%rsi), %xmm5 movss %xmm5, (%rdx) mulss 4(%rcx), %xmm5 subss %xmm5, %xmm4 mulss 4(%rsi), %xmm4 movss %xmm4, 4(%rdx) movss 8(%rcx), %xmm3 mulss %xmm4, %xmm3 subss %xmm3, %xmm0 mulss 8(%rsi), %xmm0 movss %xmm0, 8(%rdx) movss 12(%rcx), %xmm2 addq$16, %rcx mulss %xmm0, %xmm2 movss 12(%rdx), %xmm0 subss %xmm2, %xmm0 mulss 12(%rsi), %xmm0 addq$16, %rsi movss %xmm0, 12(%rdx) addq$16, %rdx cmpl%r8d, %ebp jne .L6 -- rguenth at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2008-04-21 07:11:35 |2009-07-03 11:05:43 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #18 from rguenther at suse dot de 2009-07-03 09:08 --- Subject: Re: [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 On Fri, 3 Jul 2009, ubizjak at gmail dot com wrote: > --- Comment #17 from ubizjak at gmail dot com 2009-07-03 08:46 --- > (In reply to comment #16) > > > One of the cases SCEV is confused about pointer-plus offsets being sizetype: > > Do we have a solution for this problem...? My hope is that no-undefined-overflow will somehow magically solve these problems ... otherwise no, there is unfortunately no way out here. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #17 from ubizjak at gmail dot com 2009-07-03 08:46 --- (In reply to comment #16) > One of the cases SCEV is confused about pointer-plus offsets being sizetype: Do we have a solution for this problem...? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #16 from rguenth at gcc dot gnu dot org 2009-06-25 09:01 --- Executing predictive commoning without unrolling. with -m32. One of the cases SCEV is confused about pointer-plus offsets being sizetype: (Data Ref: stmt: (*x_58(D))[D.1627_54] = D.1638_71; ref: (*x_58(D))[D.1627_54]; base_object: (*x_58(D))[0]; Access function 0: {(integer(kind=8)) i_43 + -1, +, 1}_1 Access function 1: 0B vs. (Data Ref: stmt: D.1634_67 = (*x_58(D))[D.1632_62]; ref: (*x_58(D))[D.1632_62]; base_object: (*x_58(D))[0]; Access function 0: {(integer(kind=8)) (i_43 + -1) + -1, +, 1}_1 Access function 1: 0B -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #15 from ubizjak at gmail dot com 2009-06-25 08:31 --- (In reply to comment #14) > (In reply to comment #13) > > Predictive commoning does exactly what you want. Predictive commoning failed: no suitable chains -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
--- Comment #14 from ubizjak at gmail dot com 2009-06-25 08:25 --- (In reply to comment #13) > Predictive commoning does exactly what you want. It is not effective for the testcase in Comment #9. The dumps for innermost loop are the same for -O2 -funroll-loops [-fpredictive-commoning]: .L6: movss (%rsi), %xmm9 addl$4, %r8d mulss (%rcx), %xmm9 movss (%rdx), %xmm8 movss 4(%rdx), %xmm6 movss 8(%rdx), %xmm4 movss 12(%rdx), %xmm2 subss %xmm9, %xmm8 mulss 0(%rbp), %xmm8 movss %xmm8, (%rdx) movss 4(%rsi), %xmm7 mulss 4(%rcx), %xmm7 subss %xmm7, %xmm6 mulss 4(%rbp), %xmm6 movss %xmm6, 4(%rdx) movss 8(%rsi), %xmm5 mulss 8(%rcx), %xmm5 subss %xmm5, %xmm4 mulss 8(%rbp), %xmm4 movss %xmm4, 8(%rdx) movss 12(%rsi), %xmm3 addq$16, %rsi mulss 12(%rcx), %xmm3 addq$16, %rcx subss %xmm3, %xmm2 mulss 12(%rbp), %xmm2 addq$16, %rbp movss %xmm2, 12(%rdx) addq$16, %rdx cmpl%r9d, %r8d jne .L6 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163