[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-07-03 Thread ubizjak at gmail dot com


--- Comment #17 from ubizjak at gmail dot com  2009-07-03 08:46 ---
(In reply to comment #16)

 One of the cases SCEV is confused about pointer-plus offsets being sizetype:

Do we have a solution for this problem...?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-07-03 Thread rguenther at suse dot de


--- Comment #18 from rguenther at suse dot de  2009-07-03 09:08 ---
Subject: Re:  [4.3/4.4/4.5 Regression] 10% performance
 regression since Nov 1 on Polyhedron's NF on AMD64

On Fri, 3 Jul 2009, ubizjak at gmail dot com wrote:

 --- Comment #17 from ubizjak at gmail dot com  2009-07-03 08:46 ---
 (In reply to comment #16)
 
  One of the cases SCEV is confused about pointer-plus offsets being sizetype:
 
 Do we have a solution for this problem...?

My hope is that no-undefined-overflow will somehow magically solve
these problems ... otherwise no, there is unfortunately no way out
here.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #19 from rguenth at gcc dot gnu dot org  2009-07-03 11:05 
---
In fact, in this case we have the C equivalent

  int i;
  long j = (long)(i - 1);

vs.

  long j = (long)i - 1;

which I believe are equivalent if overflow is undefined (or i - 1 does not
wrap).

It is just that fold obviously considers (long)i - 1 to be more expensive
than (long)(i - 1) and thus does not transform the latter into the former
(and it can't transform (long)i - 1 to (long)(i - 1) as if (long)i - 1
does not overflow there is no guarantee that i - 1 does not).

We should be able to do the former transformation during SCEV analysis
though.

I have a patch which results in (-O3 -ffast-math -funroll-loops)

.L6:
mulss   (%rcx), %xmm0
movss   (%rdx), %xmm5
movss   4(%rdx), %xmm4
addl$4, %ebp
subss   %xmm0, %xmm5
movss   8(%rdx), %xmm0
mulss   (%rsi), %xmm5
movss   %xmm5, (%rdx)
mulss   4(%rcx), %xmm5
subss   %xmm5, %xmm4
mulss   4(%rsi), %xmm4
movss   %xmm4, 4(%rdx)
movss   8(%rcx), %xmm3
mulss   %xmm4, %xmm3
subss   %xmm3, %xmm0
mulss   8(%rsi), %xmm0
movss   %xmm0, 8(%rdx)
movss   12(%rcx), %xmm2
addq$16, %rcx
mulss   %xmm0, %xmm2
movss   12(%rdx), %xmm0
subss   %xmm2, %xmm0
mulss   12(%rsi), %xmm0
addq$16, %rsi
movss   %xmm0, 12(%rdx)
addq$16, %rdx
cmpl%r8d, %ebp
jne .L6


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |rguenth at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2008-04-21 07:11:35 |2009-07-03 11:05:43
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #20 from rguenth at gcc dot gnu dot org  2009-07-03 11:14 
---
Before:

 Time for setup  0.139
 Time per iteration  0.271
 Total Time  6.649
 Time for setup  0.136
 Time per iteration  0.265
 Total Time 10.210
 Time for setup  0.134
 Time per iteration  0.265
 Total Time  7.276
 Time for setup  0.134
 Time per iteration  0.260
 Total Time 11.572

After:

 Time for setup  0.114
 Time per iteration  0.238
 Total Time  5.834
 Time for setup  0.111
 Time per iteration  0.233
 Total Time  8.948
 Time for setup  0.110
 Time per iteration  0.237
 Total Time  6.504
 Time for setup  0.112
 Time per iteration  0.235
 Total Time 10.454

which seems to exactly recover this regression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #21 from rguenth at gcc dot gnu dot org  2009-07-03 11:22 
---
Created an attachment (id=18133)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18133action=view)
patch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #22 from rguenth at gcc dot gnu dot org  2009-07-03 14:11 
---
Subject: Bug 34163

Author: rguenth
Date: Fri Jul  3 14:11:14 2009
New Revision: 149207

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=149207
Log:
2009-07-03  Richard Guenther  rguent...@suse.de

PR middle-end/34163
* tree-chrec.c (chrec_convert_1): Fold (T2)(t +- x) to
(T2)t +- (T2)x if t +- x is known to not overflow and
the conversion widens the operation.
* Makefile.in (tree-chrec.o): Add $(FLAGS_H) dependency.

* gfortran.dg/pr34163.f90: New testcase.

Added:
trunk/gcc/testsuite/gfortran.dg/pr34163.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-chrec.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-06-25 Thread ubizjak at gmail dot com


--- Comment #14 from ubizjak at gmail dot com  2009-06-25 08:25 ---
(In reply to comment #13)
 Predictive commoning does exactly what you want.

It is not effective for the testcase in Comment #9. The dumps for innermost
loop are the same for -O2 -funroll-loops [-fpredictive-commoning]:

.L6:
movss   (%rsi), %xmm9
addl$4, %r8d
mulss   (%rcx), %xmm9
movss   (%rdx), %xmm8
movss   4(%rdx), %xmm6
movss   8(%rdx), %xmm4
movss   12(%rdx), %xmm2
subss   %xmm9, %xmm8
mulss   0(%rbp), %xmm8
movss   %xmm8, (%rdx)
movss   4(%rsi), %xmm7
mulss   4(%rcx), %xmm7
subss   %xmm7, %xmm6
mulss   4(%rbp), %xmm6
movss   %xmm6, 4(%rdx)
movss   8(%rsi), %xmm5
mulss   8(%rcx), %xmm5
subss   %xmm5, %xmm4
mulss   8(%rbp), %xmm4
movss   %xmm4, 8(%rdx)
movss   12(%rsi), %xmm3
addq$16, %rsi
mulss   12(%rcx), %xmm3
addq$16, %rcx
subss   %xmm3, %xmm2
mulss   12(%rbp), %xmm2
addq$16, %rbp
movss   %xmm2, 12(%rdx)
addq$16, %rdx
cmpl%r9d, %r8d
jne .L6


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-06-25 Thread ubizjak at gmail dot com


--- Comment #15 from ubizjak at gmail dot com  2009-06-25 08:31 ---
(In reply to comment #14)
 (In reply to comment #13)
  Predictive commoning does exactly what you want.

Predictive commoning failed: no suitable chains


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163



[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's NF on AMD64

2009-06-25 Thread rguenth at gcc dot gnu dot org


--- Comment #16 from rguenth at gcc dot gnu dot org  2009-06-25 09:01 
---
Executing predictive commoning without unrolling.

with -m32.  One of the cases SCEV is confused about pointer-plus offsets
being sizetype:

(Data Ref:
  stmt: (*x_58(D))[D.1627_54] = D.1638_71;
  ref: (*x_58(D))[D.1627_54];
  base_object: (*x_58(D))[0];
  Access function 0: {(integer(kind=8)) i_43 + -1, +, 1}_1
  Access function 1: 0B

vs.

(Data Ref:
  stmt: D.1634_67 = (*x_58(D))[D.1632_62];
  ref: (*x_58(D))[D.1632_62];
  base_object: (*x_58(D))[0];
  Access function 0: {(integer(kind=8)) (i_43 + -1) + -1, +, 1}_1
  Access function 1: 0B


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163