[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #37 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-27 
15:55:57 UTC ---
/usr/src/gcc/objr/gcc/f951 -quiet -ftime-report -fbounds-check -g -O3
-ffast-math -funroll-loops -ftree-vectorize -march=amdfam10 pr45422.f90 21 |
grep ':[ ]*[1-9]\|TOTAL'
 garbage collection:   1.34 ( 1%) usr   0.00 ( 0%) sys   1.32 ( 1%) wall   
   0 kB ( 0%) ggc
 cfg cleanup   :   2.24 ( 2%) usr   0.01 ( 0%) sys   2.26 ( 2%) wall   
7301 kB ( 0%) ggc
 df reaching defs  :   1.46 ( 1%) usr   0.02 ( 1%) sys   1.34 ( 1%) wall   
   0 kB ( 0%) ggc
 df live regs  :   8.28 ( 6%) usr   0.02 ( 1%) sys   8.49 ( 6%) wall   
   0 kB ( 0%) ggc
 df liveinitialized regs:   2.46 ( 2%) usr   0.00 ( 0%) sys   2.98 ( 2%) wall 
 0 kB ( 0%) ggc
 df use-def / def-use chains:   1.31 ( 1%) usr   0.00 ( 0%) sys   1.13 ( 1%)
wall   0 kB ( 0%) ggc
 df reg dead/unused notes:   4.01 ( 3%) usr   0.00 ( 0%) sys   4.03 ( 3%) wall 
  7770 kB ( 0%) ggc
 register information  :   1.48 ( 1%) usr   0.00 ( 0%) sys   1.53 ( 1%) wall   
   0 kB ( 0%) ggc
 alias analysis:   1.86 ( 1%) usr   0.00 ( 0%) sys   1.89 ( 1%) wall  
46655 kB ( 3%) ggc
 tree VRP  :   2.25 ( 2%) usr   0.08 ( 4%) sys   2.27 ( 2%) wall  
74472 kB ( 4%) ggc
 tree SSA incremental  :   1.43 ( 1%) usr   0.25 (11%) sys   1.34 ( 1%) wall   
7187 kB ( 0%) ggc
 complete unrolling:   1.19 ( 1%) usr   0.14 ( 6%) sys   1.24 ( 1%) wall  
91809 kB ( 5%) ggc
 tree prefetching  :   1.31 ( 1%) usr   0.12 ( 5%) sys   1.50 ( 1%) wall  
92179 kB ( 5%) ggc
 tree iv optimization  :  15.43 (11%) usr   0.09 ( 4%) sys  15.62 (11%) wall 
303704 kB (17%) ggc
 expand:   1.11 ( 1%) usr   0.03 ( 1%) sys   1.11 ( 1%) wall  
81489 kB ( 5%) ggc
 forward prop  :   1.17 ( 1%) usr   0.01 ( 0%) sys   1.19 ( 1%) wall  
16030 kB ( 1%) ggc
 CSE   :   1.58 ( 1%) usr   0.01 ( 0%) sys   1.42 ( 1%) wall   
 667 kB ( 0%) ggc
 dead code elimination :   1.24 ( 1%) usr   0.00 ( 0%) sys   1.30 ( 1%) wall   
   0 kB ( 0%) ggc
 dead store elim1  :   1.37 ( 1%) usr   0.00 ( 0%) sys   1.31 ( 1%) wall  
23509 kB ( 1%) ggc
 dead store elim2  :   1.10 ( 1%) usr   0.00 ( 0%) sys   1.08 ( 1%) wall  
22323 kB ( 1%) ggc
 loop unrolling:   3.99 ( 3%) usr   0.03 ( 1%) sys   4.11 ( 3%) wall 
185245 kB (11%) ggc
 CPROP :   2.25 ( 2%) usr   0.01 ( 0%) sys   2.00 ( 1%) wall  
25084 kB ( 1%) ggc
 PRE   :   1.20 ( 1%) usr   0.00 ( 0%) sys   1.13 ( 1%) wall   
1576 kB ( 0%) ggc
 web   :   1.09 ( 1%) usr   0.00 ( 0%) sys   1.09 ( 1%) wall   
8368 kB ( 0%) ggc
 CSE 2 :   2.10 ( 2%) usr   0.01 ( 0%) sys   2.17 ( 2%) wall   
2122 kB ( 0%) ggc
 combiner  :   3.97 ( 3%) usr   0.00 ( 0%) sys   3.96 ( 3%) wall  
60594 kB ( 3%) ggc
 integrated RA :  10.18 ( 7%) usr   0.01 ( 0%) sys  10.27 ( 7%) wall  
44477 kB ( 3%) ggc
 reload:   6.31 ( 5%) usr   0.01 ( 0%) sys   6.24 ( 4%) wall  
10153 kB ( 1%) ggc
 reload CSE regs   :   4.39 ( 3%) usr   0.01 ( 0%) sys   4.17 ( 3%) wall  
37354 kB ( 2%) ggc
 rename registers  :   1.13 ( 1%) usr   0.00 ( 0%) sys   1.18 ( 1%) wall   
2500 kB ( 0%) ggc
 scheduling 2  :   5.84 ( 4%) usr   0.02 ( 1%) sys   5.81 ( 4%) wall   
1160 kB ( 0%) ggc
 final :   4.29 ( 3%) usr   0.04 ( 2%) sys   4.66 ( 3%) wall  
10463 kB ( 1%) ggc
 variable tracking :   2.76 ( 2%) usr   0.01 ( 0%) sys   2.73 ( 2%) wall  
64964 kB ( 4%) ggc
 var-tracking dataflow :   3.86 ( 3%) usr   0.02 ( 1%) sys   3.90 ( 3%) wall   
   0 kB ( 0%) ggc
 var-tracking emit :   3.89 ( 3%) usr   0.01 ( 0%) sys   3.85 ( 3%) wall  
19488 kB ( 1%) ggc
 rest of compilation   :   2.27 ( 2%) usr   0.08 ( 4%) sys   2.28 ( 2%) wall  
21438 kB ( 1%) ggc
 remove unused locals  :   1.02 ( 1%) usr   0.01 ( 0%) sys   0.92 ( 1%) wall   
   0 kB ( 0%) ggc
 unaccounted todo  :   1.21 ( 1%) usr   0.05 ( 2%) sys   1.19 ( 1%) wall   
   8 kB ( 0%) ggc
 TOTAL : 137.09 2.28   139.39   
1741129 kB

/usr/src/gcc-4.5/objr/gcc/f951 -quiet -ftime-report -fbounds-check -g -O3
-ffast-math -funroll-loops -ftree-vectorize -march=amdfam10 pr45422.f90 21 |
grep ':[  ]*[1-9]\|TOTAL'
 df live regs  :   2.05 ( 4%) usr   0.00 ( 0%) sys   1.95 ( 4%) wall   
   0 kB ( 0%) ggc
 tree VRP  :   1.43 ( 3%) usr   0.15 ( 8%) sys   1.47 ( 3%) wall  
56376 kB ( 6%) ggc
 complete unrolling:   1.14 ( 2%) usr   0.18 (10%) sys   1.39 ( 3%) wall  
98554 kB (11%) ggc
 tree iv optimization  :   5.31 (10%) usr   0.05 ( 3%) sys   5.40 (10%) wall  
95356 kB (11%) ggc
 expand:   2.98 ( 6%) usr   0.11 ( 6%) sys   3.29 ( 6%) wall  
69642 kB ( 8%) ggc
 combiner  :   1.49 ( 3%) usr   0.00 ( 0%) sys   1.22 ( 2%) wall  
19980 kB ( 2%) ggc
 integrated RA :   3.60 ( 7%) usr   0.01 ( 1%) sys   3.56 ( 6%) wall  
12746 kB 

[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #38 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-27 
16:02:49 UTC ---
*.gimple dump is roughly the same size between 4.5 and 4.6, but resulting
assembly size is 15MB in 4.5 and 23MB (with only  100KB variation with
-fno-ivopts) in 4.6.  -fno-inline doesn't help neither compile time nor
assembly size though on 4.6.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #39 from Richard Guenther rguenth at gcc dot gnu.org 2011-01-27 
16:16:48 UTC ---
The size difference is likely from prefetching, it's 1.5MB vs. 1.1MB without
that (-O3 -fbounds-check -ffast-math -funroll-loops).  Prefetching usually
causes another set of (then RTL unrolled) loop copies.  See PR44688.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #40 from Richard Guenther rguenth at gcc dot gnu.org 2011-01-27 
16:19:26 UTC ---
Btw, when I remove -fbounds-check the sizes are comparable (without
prefetching),
so I guess we are just better in removing bounds checking for 4.6 and that
triggers size-costly loop opts such as vectorization and unrolling.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #41 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-27 
16:28:49 UTC ---
With additional -fno-prefetch-loop-arrays the TOTAL goes down from that 137s to
92.23, and judging from tree dumps between 4.5 and 4.6 we do significantly more
vectorization too (4.6 *.ifcvt is 4.7MB compared to 5.3MB 4.5 *.ifcvt, while
4.6 *.vect grows to 8.3MB while 4.5 *.vect stays at 5.3MB).


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #42 from Richard Guenther rguenth at gcc dot gnu.org 2011-01-27 
16:30:52 UTC ---
Comparing -O3 -ffast-math -funroll-loops -fno-inline -fno-partial-inlining
(thus generic arch, without prefetching):

trunk:

 df live regs  :   4.22 ( 6%) usr   0.04 ( 2%) sys   4.11 ( 5%) wall   
   0 kB ( 0%) ggc
 tree iv optimization  :   3.92 ( 5%) usr   0.13 ( 5%) sys   4.29 ( 6%) wall  
91066 kB (11%) ggc
 integrated RA :   5.57 ( 8%) usr   0.10 ( 4%) sys   5.93 ( 8%) wall  
26408 kB ( 3%) ggc
 scheduling 2  :   3.73 ( 5%) usr   0.04 ( 2%) sys   3.85 ( 5%) wall   
 939 kB ( 0%) ggc
 TOTAL :  73.68 2.3776.91
852775 kB

4.5:

 df live regs  :   4.60 ( 7%) usr   0.02 ( 1%) sys   4.62 ( 6%) wall   
   0 kB ( 0%) ggc
 expand:   3.94 ( 6%) usr   0.17 ( 8%) sys   3.94 ( 6%) wall  
62218 kB ( 8%) ggc
 integrated RA :   5.73 ( 8%) usr   0.02 ( 1%) sys   5.76 ( 8%) wall  
22920 kB ( 3%) ggc
 reload:   3.78 ( 5%) usr   0.08 ( 4%) sys   3.86 ( 5%) wall   
9291 kB ( 1%) ggc
 TOTAL :  68.98 2.0171.22
828137 kB

it would be nice to confirm that we are indeed much better with
optimizing bounds-checking code.  The prefetching issue is
tracked as PR44688.  So I'd close this either as a dup or as
wontfix (it's a feature that we optimize loops with bounds-checking).


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||WONTFIX

--- Comment #43 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-27 
16:43:17 UTC ---
Yeah, I agree.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-27 Thread xinliangli at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #44 from davidxl xinliangli at gmail dot com 2011-01-27 17:33:42 
UTC ---
Nice triaging..

David


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-25 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #32 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-25 
09:02:57 UTC ---
IMHO for P1 purposes we should just look at compile time regressions from 4.5
here at this point.  On the #c1 testcase I get with --enable-checking=release
current trunk and current 4.5 branch on x86_64-linux:

4.6 x86_64 -m64 -O3 -fbounds-check -ftime-report
 df live regs  :   1.87 ( 3%) usr   0.02 ( 1%) sys   1.66 ( 3%) wall   
   0 kB ( 0%) ggc
 parser:   1.04 ( 2%) usr   0.20 ( 9%) sys   1.24 ( 2%) wall  
53425 kB ( 6%) ggc
 tree VRP  :   1.82 ( 3%) usr   0.09 ( 4%) sys   2.02 ( 3%) wall  
63870 kB ( 8%) ggc
 tree PTA  :   1.02 ( 2%) usr   0.01 ( 0%) sys   0.98 ( 2%) wall   
5498 kB ( 1%) ggc  
 tree SSA incremental  :   1.23 ( 2%) usr   0.12 ( 6%) sys   1.11 ( 2%) wall   
6733 kB ( 1%) ggc
 tree CCP  :   1.33 ( 2%) usr   0.03 ( 1%) sys   1.33 ( 2%) wall   
4989 kB ( 1%) ggc
 complete unrolling:   1.07 ( 2%) usr   0.16 ( 8%) sys   1.28 ( 2%) wall  
88755 kB (11%) ggc
 tree iv optimization  :  10.99 (19%) usr   0.09 ( 4%) sys  11.09 (19%) wall 
138994 kB (16%) ggc  
 CSE   :   1.28 ( 2%) usr   0.01 ( 0%) sys   1.28 ( 2%) wall   
 229 kB ( 0%) ggc
 combiner  :   2.00 ( 3%) usr   0.00 ( 0%) sys   1.95 ( 3%) wall  
31554 kB ( 4%) ggc  
 integrated RA :   3.68 ( 6%) usr   0.01 ( 0%) sys   3.78 ( 6%) wall  
19906 kB ( 2%) ggc
 reload:   2.04 ( 4%) usr   0.00 ( 0%) sys   2.18 ( 4%) wall   
7106 kB ( 1%) ggc
 reload CSE regs   :   2.04 ( 4%) usr   0.02 ( 1%) sys   2.01 ( 3%) wall  
12188 kB ( 1%) ggc
 scheduling 2  :   2.55 ( 4%) usr   0.01 ( 0%) sys   2.61 ( 4%) wall   
 895 kB ( 0%) ggc
 TOTAL :  57.47 2.1159.60
845009 kB

4.5 x86_64 -m64 -O3 -fbounds-check -ftime-report
 df live regs  :   1.58 ( 4%) usr   0.00 ( 0%) sys   1.39 ( 3%) wall   
   0 kB ( 0%) ggc 
 parser:   1.02 ( 2%) usr   0.18 ( 9%) sys   1.21 ( 3%) wall  
55472 kB ( 7%) ggc 
 tree VRP  :   1.39 ( 3%) usr   0.13 ( 6%) sys   1.73 ( 4%) wall  
56478 kB ( 8%) ggc
 tree PRE  :   1.03 ( 2%) usr   0.04 ( 2%) sys   1.24 ( 3%) wall   
7286 kB ( 1%) ggc
 complete unrolling:   1.32 ( 3%) usr   0.21 (10%) sys   1.55 ( 3%) wall  
91137 kB (12%) ggc
 tree iv optimization  :   5.45 (12%) usr   0.09 ( 4%) sys   5.43 (12%) wall  
95576 kB (13%) ggc
 expand:   2.62 ( 6%) usr   0.16 ( 8%) sys   2.76 ( 6%) wall  
58104 kB ( 8%) ggc
 CSE   :   1.18 ( 3%) usr   0.01 ( 0%) sys   0.94 ( 2%) wall   
 261 kB ( 0%) ggc
 combiner  :   1.53 ( 3%) usr   0.00 ( 0%) sys   1.48 ( 3%) wall  
19953 kB ( 3%) ggc
 integrated RA :   3.21 ( 7%) usr   0.00 ( 0%) sys   3.55 ( 8%) wall  
11410 kB ( 2%) ggc
 reload:   2.13 ( 5%) usr   0.04 ( 2%) sys   2.00 ( 4%) wall   
7273 kB ( 1%) ggc
 reload CSE regs   :   1.67 ( 4%) usr   0.01 ( 0%) sys   1.55 ( 3%) wall  
10032 kB ( 1%) ggc
 scheduling 2  :   2.65 ( 6%) usr   0.02 ( 1%) sys   2.66 ( 6%) wall   
1063 kB ( 0%) ggc
 TOTAL :  44.55 2.0546.62
747832 kB

4.6 x86_64 -m32 -O3 -fbounds-check -ftime-report
 df live regs  :   1.24 ( 2%) usr   0.02 ( 1%) sys   1.05 ( 2%) wall   
   0 kB ( 0%) ggc
 parser:   1.05 ( 2%) usr   0.18 ( 9%) sys   1.23 ( 2%) wall  
53861 kB ( 7%) ggc
 tree VRP  :   1.48 ( 3%) usr   0.05 ( 2%) sys   1.78 ( 3%) wall  
52970 kB ( 7%) ggc
 tree iv optimization  :   9.92 (19%) usr   0.15 ( 7%) sys   9.98 (18%) wall 
125735 kB (17%) ggc
 CSE   :   1.46 ( 3%) usr   0.00 ( 0%) sys   1.42 ( 3%) wall   
 329 kB ( 0%) ggc
 combiner  :   1.41 ( 3%) usr   0.01 ( 0%) sys   1.35 ( 2%) wall  
20981 kB ( 3%) ggc
 integrated RA :   2.89 ( 6%) usr   0.00 ( 0%) sys   2.83 ( 5%) wall  
14083 kB ( 2%) ggc
 reload:   2.59 ( 5%) usr   0.02 ( 1%) sys   2.58 ( 5%) wall  
18918 kB ( 3%) ggc
 reload CSE regs   :   2.62 ( 5%) usr   0.00 ( 0%) sys   2.91 ( 5%) wall  
13557 kB ( 2%) ggc
 scheduling 2  :   2.49 ( 5%) usr   0.01 ( 0%) sys   2.45 ( 5%) wall   
 953 kB ( 0%) ggc
 TOTAL :  52.36 2.0254.39
744417 kB

4.5 x86_64 -m32 -O3 -fbounds-check -ftime-report
 df live regs  :   1.41 ( 3%) usr   0.02 ( 1%) sys   1.43 ( 3%) wall   
   0 kB ( 0%) ggc
 parser:   1.02 ( 2%) usr   0.18 ( 9%) sys   1.19 ( 2%) wall  
55913 kB ( 8%) ggc
 tree VRP  :   1.44 ( 3%) usr   0.14 ( 7%) sys   1.39 ( 3%) wall  
54451 kB ( 8%) ggc
 tree iv optimization  :   7.76 (17%) usr   0.11 ( 5%) sys   8.02 (17%) wall 
107362 kB (15%) ggc
 expand:   2.66 ( 6%) usr   0.08 ( 4%) sys   2.73 ( 6%) wall  
56088 kB ( 8%) ggc
 CSE   :   1.41 ( 3%) usr   0.00 ( 

[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-25 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #33 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-01-25 09:47:10 UTC ---
I just note that the timings reported by David and Jakub are not for the
compile options I originally reported.

With 4.6 (20110117) I now have 

gfortran -c -ftime-report -cpp -fbounds-check -g -O3 -ffast-math -funroll-loops
-ftree-vectorize -march=native -ffree-form PR45422.F90
TOTAL : 102.15 

while with the options used by David / Jakub I have timings similar to theirs.

gfortran -O3 -fbounds-check -ftime-report -c PR45422.F90
 TOTAL :  42.87

With 4.5 timings remain ~44s


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-25 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #34 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-25 
09:52:23 UTC ---
-march=native is ambiguous, please see with -v what actually is being used.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-25 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #35 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-01-25 10:03:02 UTC ---
(In reply to comment #34)
 -march=native is ambiguous, please see with -v what actually is being used.

This was mentioned in the initial comment:
-march=k8-sse3 -mcx16 -msahf
--param l1-cache-size=64 --param l1-cache-line-size=64 --param
l2-cache-size=1024 -mtune=k8

The latest timings are on a newer machine (old one is gone now) which has:
-march=amdfam10 -mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=512 -mtune=amdfam10


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-25 Thread xinliangli at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #36 from davidxl xinliangli at gmail dot com 2011-01-25 17:28:30 
UTC ---
(In reply to comment #35)
 (In reply to comment #34)
  -march=native is ambiguous, please see with -v what actually is being used.
 
 This was mentioned in the initial comment:
 -march=k8-sse3 -mcx16 -msahf
 --param l1-cache-size=64 --param l1-cache-line-size=64 --param
 l2-cache-size=1024 -mtune=k8
 
 The latest timings are on a newer machine (old one is gone now) which has:
 -march=amdfam10 -mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param
 l1-cache-line-size=64 --param l2-cache-size=512 -mtune=amdfam10

I did use the options you originally posted -ftime-report -cpp -fbounds-check
-g -O3 -ffast-math -funroll-loops -ftree-vectorize -march=native -ffree-form.
The timing is consistently 58s on my 2.4Ghz core-2 box, and 42s on the 2.67Ghz
Xeon machine.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-21 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #28 from Jakub Jelinek jakub at gcc dot gnu.org 2011-01-21 
09:50:25 UTC ---
David, any progress with this?


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-21 Thread xinliangli at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #29 from davidxl xinliangli at gmail dot com 2011-01-21 16:27:43 
UTC ---
(In reply to comment #28)
 David, any progress with this?

The cost function fix to make sure solution set does not become too big will be
probably very involved and won't be availlable in 4.6 time frame. I will get a
workaround using Richard's suggestion -- terminate the iterating loop when slow
convergence is detected and some limit is reached.

David


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-21 Thread xinliangli at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #30 from davidxl xinliangli at gmail dot com 2011-01-21 19:58:41 
UTC ---
(In reply to comment #29)
 (In reply to comment #28)
  David, any progress with this?
 
 The cost function fix to make sure solution set does not become too big will 
 be
 probably very involved and won't be availlable in 4.6 time frame. I will get a
 workaround using Richard's suggestion -- terminate the iterating loop when 
 slow
 convergence is detected and some limit is reached.
 
 David


Two observations:
1) I can not reproduce the timing by Joost -- see below. Can someone else
measure the time independently?

2) Limiting the iteration count of ivopt improvement loop does not help that
much: from unlimited (can be ~40 in this case) to max iteration of 5 only
cutdown total compile time by 2s.


The following is the timing of the trunk compiler. Options: 
-O2 -ftime-report -cpp -fbounds-check -g -O3 -ffast-math -funroll-loops
-ftree-vectorize -march=native -ffree-form 

 parser:   0.67 ( 1%) usr   0.09 ( 6%) sys   0.77 ( 1%) wall  
53556 kB ( 5%) ggc
 inline heuristics :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
   0 kB ( 0%) ggc
 tree gimplify :   0.35 ( 1%) usr   0.03 ( 2%) sys   0.38 ( 1%) wall  
48426 kB ( 4%) ggc
 tree eh   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CFG construction :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
11978 kB ( 1%) ggc
 tree CFG cleanup  :   0.68 ( 1%) usr   0.02 ( 1%) sys   0.64 ( 1%) wall   
2484 kB ( 0%) ggc
 tree VRP  :   0.83 ( 1%) usr   0.02 ( 1%) sys   1.28 ( 2%) wall  
64371 kB ( 6%) ggc
 tree copy propagation :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall   
1267 kB ( 0%) ggc
 tree find ref. vars   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
3806 kB ( 0%) ggc
 tree PTA  :   0.82 ( 1%) usr   0.00 ( 0%) sys   0.80 ( 1%) wall   
5497 kB ( 0%) ggc
 tree PHI insertion:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
3194 kB ( 0%) ggc
 tree SSA rewrite  :   0.23 ( 0%) usr   0.01 ( 1%) sys   0.21 ( 0%) wall  
14021 kB ( 1%) ggc
 tree SSA other:   0.06 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 0%) wall   
 435 kB ( 0%) ggc
 tree SSA incremental  :   0.65 ( 1%) usr   0.02 ( 1%) sys   0.65 ( 1%) wall   
6735 kB ( 1%) ggc
 tree operand scan :   0.37 ( 1%) usr   0.14 ( 9%) sys   0.53 ( 1%) wall  
47156 kB ( 4%) ggc
 dominator optimization:   0.38 ( 1%) usr   0.02 ( 1%) sys   0.50 ( 1%) wall   
6948 kB ( 1%) ggc
 tree SRA  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CCP  :   0.93 ( 1%) usr   0.01 ( 1%) sys   1.02 ( 2%) wall   
4975 kB ( 0%) ggc
 tree PHI const/copy prop:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   124 kB ( 0%) ggc
 tree split crit edges :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
1743 kB ( 0%) ggc
 tree reassociation:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall   
5095 kB ( 0%) ggc
 tree PRE  :   0.64 ( 1%) usr   0.00 ( 0%) sys   0.64 ( 1%) wall   
9790 kB ( 1%) ggc
 tree FRE  :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall   
5410 kB ( 0%) ggc
 tree code sinking :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
 956 kB ( 0%) ggc
 tree linearize phis   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 tree forward propagate:   0.17 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall  
11005 kB ( 1%) ggc
 tree phiprop  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree conservative DCE :   0.04 ( 0%) usr   0.02 ( 1%) sys   0.06 ( 0%) wall   
 944 kB ( 0%) ggc
 tree aggressive DCE   :   0.31 ( 0%) usr   0.03 ( 2%) sys   0.40 ( 1%) wall  
15336 kB ( 1%) ggc
 tree DSE  :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
 225 kB ( 0%) ggc
 tree loop bounds  :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall   
6744 kB ( 1%) ggc
 tree loop invariant motion:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%)
wall 485 kB ( 0%) ggc
 tree canonical iv :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
3128 kB ( 0%) ggc
 scev constant prop:   0.04 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall   
1924 kB ( 0%) ggc
 complete unrolling:   0.79 ( 1%) usr   0.05 ( 3%) sys   0.85 ( 1%) wall  
91364 kB ( 8%) ggc
 tree vectorization:   0.34 ( 1%) usr   0.00 ( 0%) sys   0.37 ( 1%) wall  
25117 kB ( 2%) ggc
 tree slp vectorization:   0.41 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 1%) wall  
19256 kB ( 2%) ggc
 tree loop distribution:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
 850 kB ( 0%) ggc
 tree iv optimization  :  11.14 (18%) usr   0.33 (22%) sys  12.24 (18%) wall 
141300 kB (12%) ggc
 predictive commoning  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
2696 kB ( 0%) ggc
 tree loop init:   0.02 ( 0%) usr   0.00 

[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-21 Thread xinliangli at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

--- Comment #31 from davidxl xinliangli at gmail dot com 2011-01-21 20:08:11 
UTC ---
Comparing this timing with 4.6 results (164s), looks like many other passes
become slower other than ivopt (e.g IRA increases from 3.5s to 11s etc -- ivopt
only account for a small part of the 110s increase.

David


(In reply to comment #18)
 FYI, these are the 4.5 branch timings:
 
 Execution times (seconds)
  garbage collection:   0.47 ( 1%) usr   0.00 ( 0%) sys   0.47 ( 1%) wall  
  
0 kB ( 0%) ggc
  callgraph construction:   0.05 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 0%) wall  
  
 5996 kB ( 1%) ggc
  callgraph optimization:   0.21 ( 0%) usr   0.02 ( 1%) sys   0.26 ( 0%) wall  
  
  606 kB ( 0%) ggc
  ipa cp:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall  
  
 1381 kB ( 0%) ggc
  ipa reference :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall  
  
0 kB ( 0%) ggc
  ipa pure const:   0.06 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 0%) wall  
  
0 kB ( 0%) ggc
  cfg cleanup   :   0.39 ( 1%) usr   0.00 ( 0%) sys   0.51 ( 1%) wall  
  
 2459 kB ( 0%) ggc
  trivially dead code   :   0.34 ( 1%) usr   0.00 ( 0%) sys   0.30 ( 1%) wall  
  
0 kB ( 0%) ggc
  df multiple defs  :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall  
  
0 kB ( 0%) ggc
  df reaching defs  :   0.33 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 1%) wall  
  
0 kB ( 0%) ggc
  df live regs  :   2.08 ( 4%) usr   0.01 ( 1%) sys   2.19 ( 4%) wall  
  
0 kB ( 0%) ggc
  df liveinitialized regs:   0.98 ( 2%) usr   0.00 ( 0%) sys   0.92 ( 2%) 
 wall 
  0 kB ( 0%) ggc
  df use-def / def-use chains:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%)
 wall   0 kB ( 0%) ggc
  df reg dead/unused notes:   0.93 ( 2%) usr   0.00 ( 0%) sys   1.04 ( 2%) 
 wall 
   5756 kB ( 1%) ggc
  register information  :   0.51 ( 1%) usr   0.01 ( 1%) sys   0.39 ( 1%) wall  
  
0 kB ( 0%) ggc
  alias analysis:   0.78 ( 1%) usr   0.01 ( 1%) sys   0.91 ( 2%) wall  
 22384 kB ( 3%) ggc
  alias stmt walking:   0.50 ( 1%) usr   0.03 ( 2%) sys   0.38 ( 1%) wall  
  
 5563 kB ( 1%) ggc
  register scan :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall  
  
0 kB ( 0%) ggc
  rebuild jump labels   :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall  
  
0 kB ( 0%) ggc
  parser:   0.82 ( 2%) usr   0.13 ( 9%) sys   0.94 ( 2%) wall  
 55603 kB ( 6%) ggc
  inline heuristics :   0.20 ( 0%) usr   0.01 ( 1%) sys   0.16 ( 0%) wall  
  
0 kB ( 0%) ggc
  tree gimplify :   0.38 ( 1%) usr   0.03 ( 2%) sys   0.40 ( 1%) wall  
 46588 kB ( 5%) ggc
  tree eh   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
  
0 kB ( 0%) ggc
  tree CFG construction :   0.04 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall  
 11964 kB ( 1%) ggc
  tree CFG cleanup  :   0.47 ( 1%) usr   0.00 ( 0%) sys   0.79 ( 1%) wall  
  
 1829 kB ( 0%) ggc
  tree VRP  :   1.46 ( 3%) usr   0.05 ( 4%) sys   1.27 ( 2%) wall  
 56376 kB ( 6%) ggc
  tree copy propagation :   0.09 ( 0%) usr   0.02 ( 1%) sys   0.22 ( 0%) wall  
  
  746 kB ( 0%) ggc
  tree find ref. vars   :   0.09 ( 0%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall  
  
 3806 kB ( 0%) ggc
  tree PTA  :   0.30 ( 1%) usr   0.00 ( 0%) sys   0.33 ( 1%) wall  
  
 3836 kB ( 0%) ggc
  tree PHI insertion:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall  
  
 3194 kB ( 0%) ggc
  tree SSA rewrite  :   0.24 ( 0%) usr   0.01 ( 1%) sys   0.29 ( 1%) wall  
 13860 kB ( 2%) ggc
  tree SSA other:   0.13 ( 0%) usr   0.02 ( 1%) sys   0.11 ( 0%) wall  
  
  418 kB ( 0%) ggc
  tree SSA incremental  :   0.89 ( 2%) usr   0.06 ( 4%) sys   0.97 ( 2%) wall  
  
 6811 kB ( 1%) ggc
  tree operand scan :   0.34 ( 1%) usr   0.23 (17%) sys   0.59 ( 1%) wall  
 44776 kB ( 5%) ggc
  dominator optimization:   0.29 ( 1%) usr   0.01 ( 1%) sys   0.35 ( 1%) wall  
  
 5152 kB ( 1%) ggc
  tree CCP  :   0.51 ( 1%) usr   0.02 ( 1%) sys   0.43 ( 1%) wall  
  
 4620 kB ( 1%) ggc
  tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
 wall 
106 kB ( 0%) ggc
  tree split crit edges :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
  
 2019 kB ( 0%) ggc
  tree reassociation:   0.12 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 0%) wall  
  
 2946 kB ( 0%) ggc
  tree PRE  :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.95 ( 2%) wall  
  
 7315 kB ( 1%) ggc
  tree FRE  :   0.45 ( 1%) usr   0.04 ( 3%) sys   0.35 ( 1%) wall  
  
 5518 kB ( 1%) ggc
  tree code sinking :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
  
 1400 kB ( 0%) ggc
  tree linearize phis   :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall  
  
0 kB ( 0%) ggc
  tree forward propagate:   0.18 ( 0%) usr   0.02 ( 1%) sys   0.16 ( 0%) wall  
 10006 kB ( 1%) ggc
  tree conservative DCE :   0.05 ( 0%) usr   0.01 ( 1%) sys   0.13 ( 

[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2011-01-17 Thread Joost.VandeVondele at pci dot uzh.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422

Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch changed:

   What|Removed |Added

   Last reconfirmed|2010-08-29 09:25:52 |2011-01-17 9:25:52

--- Comment #27 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch 
2011-01-17 11:38:36 UTC ---
timings with current trunk (release checking). 

out.4_3
 TOTAL :  34.620.43  35.27 837034 kB
out.4_5
 TOTAL :  45.300.70  46.02 897447 kB
out.trunk
 TOTAL : 165.890.99 166.971743679 kB

so time is up by 5x memory 2x relative to 4.3.


[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-09-02 Thread rguenth at gcc dot gnu dot org


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P3  |P1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-31 Thread davidxl at gcc dot gnu dot org


--- Comment #26 from davidxl at gcc dot gnu dot org  2010-08-31 17:45 
---
Good observation re. the number of IVs in the final set. This usually points to
some problem/bug in the cost function. I briefly looked at this case -- it
indeed exposes two more bugs in the cost model:

1) the computation cost of the all the cost pairs in an assignment can actually
not simply be added together, because many rewrite expressions can be commoned.
We now have the mechanism to compute with common loop invariants for register
pressure estimation, and this mechnasim needs to be extended for computation
cost.

2) the offset is not stripped when computing loop invariant expression ids --
this can cause problem in overestimating reg pressure. (The case arises more
often with loop unrolling).

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-30 Thread rguenth at gcc dot gnu dot org


--- Comment #23 from rguenth at gcc dot gnu dot org  2010-08-30 07:11 
---
(In reply to comment #22)
 Given the fact that the solution space is really large -- M^N where M is the
 number of candidates and M is the number of uses (here M == 70 and N == 48), 
 and the cost function is complicated, it will be challenging to come up with
 algorithm that converges really fast, and most importantly -- 'guarantees' an
 optimal solution..

Well - we can't guarantee an optimal solution.  We have to take compile-time
into account which means that O(M^N) is not acceptable but we need to come
up with something that can complete in O((M+N) log (M+N)) time at most.

I btw doubt that the solution found is anywhere near optimal for 32bit
x86 - using 15 IVs instead of 2 can't be cheaper.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-30 Thread rguenth at gcc dot gnu dot org


--- Comment #24 from rguenth at gcc dot gnu dot org  2010-08-30 07:12 
---
(In reply to comment #20)
 (In reply to comment #16)
  adjust summary according to the last timings
  
 
 I am surprised to see such big differences between trunk and previous 
 releases.
 Compiling this test case with the those options on my core2 box (2.4GHz ) took
 only 56seconds which is comparable with the timing with a 4.4.3 compiler (with
 google local patches including ivopt improvements).

Of course - because the ivopt improvement patches are the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-30 Thread davidxl at gcc dot gnu dot org


--- Comment #25 from davidxl at gcc dot gnu dot org  2010-08-30 16:41 
---
(In reply to comment #24)
 (In reply to comment #20)
  (In reply to comment #16)
   adjust summary according to the last timings
   
  
  I am surprised to see such big differences between trunk and previous 
  releases.
  Compiling this test case with the those options on my core2 box (2.4GHz ) 
  took
  only 56seconds which is comparable with the timing with a 4.4.3 compiler 
  (with
  google local patches including ivopt improvements).
 
 Of course - because the ivopt improvement patches are the problem.
 

It is just the total time diff from Joost's measure can be just explained by
ivopt component.

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-29 Thread jv244 at cam dot ac dot uk


--- Comment #16 from jv244 at cam dot ac dot uk  2010-08-29 06:38 ---
adjust summary according to the last timings


-- 

jv244 at cam dot ac dot uk changed:

   What|Removed |Added

   Last reconfirmed|2010-08-29 05:31:37 |2010-08-29 06:38:26
   date||
Summary|[4.6 Regression] compile|[4.6 Regression] compile
   |time increases 5x.  |time increases 3x.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-29 Thread rguenth at gcc dot gnu dot org


--- Comment #17 from rguenth at gcc dot gnu dot org  2010-08-29 09:25 
---
 tree iv optimization  :  32.57 (20%) usr   0.10 ( 5%) sys  32.73 (20%) wall 
322095 kB (18%) ggc


20% is still completely unreasonable for IV optimization.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|WAITING |NEW
   Last reconfirmed|2010-08-29 06:38:26 |2010-08-29 09:25:52
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-29 Thread jv244 at cam dot ac dot uk


--- Comment #18 from jv244 at cam dot ac dot uk  2010-08-29 15:07 ---
FYI, these are the 4.5 branch timings:

Execution times (seconds)
 garbage collection:   0.47 ( 1%) usr   0.00 ( 0%) sys   0.47 ( 1%) wall   
   0 kB ( 0%) ggc
 callgraph construction:   0.05 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 0%) wall   
5996 kB ( 1%) ggc
 callgraph optimization:   0.21 ( 0%) usr   0.02 ( 1%) sys   0.26 ( 0%) wall   
 606 kB ( 0%) ggc
 ipa cp:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
1381 kB ( 0%) ggc
 ipa reference :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc
 ipa pure const:   0.06 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 0%) wall   
   0 kB ( 0%) ggc
 cfg cleanup   :   0.39 ( 1%) usr   0.00 ( 0%) sys   0.51 ( 1%) wall   
2459 kB ( 0%) ggc
 trivially dead code   :   0.34 ( 1%) usr   0.00 ( 0%) sys   0.30 ( 1%) wall   
   0 kB ( 0%) ggc
 df multiple defs  :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   
   0 kB ( 0%) ggc
 df reaching defs  :   0.33 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 1%) wall   
   0 kB ( 0%) ggc
 df live regs  :   2.08 ( 4%) usr   0.01 ( 1%) sys   2.19 ( 4%) wall   
   0 kB ( 0%) ggc
 df liveinitialized regs:   0.98 ( 2%) usr   0.00 ( 0%) sys   0.92 ( 2%) wall 
 0 kB ( 0%) ggc
 df use-def / def-use chains:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%)
wall   0 kB ( 0%) ggc
 df reg dead/unused notes:   0.93 ( 2%) usr   0.00 ( 0%) sys   1.04 ( 2%) wall 
  5756 kB ( 1%) ggc
 register information  :   0.51 ( 1%) usr   0.01 ( 1%) sys   0.39 ( 1%) wall   
   0 kB ( 0%) ggc
 alias analysis:   0.78 ( 1%) usr   0.01 ( 1%) sys   0.91 ( 2%) wall  
22384 kB ( 3%) ggc
 alias stmt walking:   0.50 ( 1%) usr   0.03 ( 2%) sys   0.38 ( 1%) wall   
5563 kB ( 1%) ggc
 register scan :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
   0 kB ( 0%) ggc
 rebuild jump labels   :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall   
   0 kB ( 0%) ggc
 parser:   0.82 ( 2%) usr   0.13 ( 9%) sys   0.94 ( 2%) wall  
55603 kB ( 6%) ggc
 inline heuristics :   0.20 ( 0%) usr   0.01 ( 1%) sys   0.16 ( 0%) wall   
   0 kB ( 0%) ggc
 tree gimplify :   0.38 ( 1%) usr   0.03 ( 2%) sys   0.40 ( 1%) wall  
46588 kB ( 5%) ggc
 tree eh   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CFG construction :   0.04 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall  
11964 kB ( 1%) ggc
 tree CFG cleanup  :   0.47 ( 1%) usr   0.00 ( 0%) sys   0.79 ( 1%) wall   
1829 kB ( 0%) ggc
 tree VRP  :   1.46 ( 3%) usr   0.05 ( 4%) sys   1.27 ( 2%) wall  
56376 kB ( 6%) ggc
 tree copy propagation :   0.09 ( 0%) usr   0.02 ( 1%) sys   0.22 ( 0%) wall   
 746 kB ( 0%) ggc
 tree find ref. vars   :   0.09 ( 0%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall   
3806 kB ( 0%) ggc
 tree PTA  :   0.30 ( 1%) usr   0.00 ( 0%) sys   0.33 ( 1%) wall   
3836 kB ( 0%) ggc
 tree PHI insertion:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
3194 kB ( 0%) ggc
 tree SSA rewrite  :   0.24 ( 0%) usr   0.01 ( 1%) sys   0.29 ( 1%) wall  
13860 kB ( 2%) ggc
 tree SSA other:   0.13 ( 0%) usr   0.02 ( 1%) sys   0.11 ( 0%) wall   
 418 kB ( 0%) ggc
 tree SSA incremental  :   0.89 ( 2%) usr   0.06 ( 4%) sys   0.97 ( 2%) wall   
6811 kB ( 1%) ggc
 tree operand scan :   0.34 ( 1%) usr   0.23 (17%) sys   0.59 ( 1%) wall  
44776 kB ( 5%) ggc
 dominator optimization:   0.29 ( 1%) usr   0.01 ( 1%) sys   0.35 ( 1%) wall   
5152 kB ( 1%) ggc
 tree CCP  :   0.51 ( 1%) usr   0.02 ( 1%) sys   0.43 ( 1%) wall   
4620 kB ( 1%) ggc
 tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   106 kB ( 0%) ggc
 tree split crit edges :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
2019 kB ( 0%) ggc
 tree reassociation:   0.12 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 0%) wall   
2946 kB ( 0%) ggc
 tree PRE  :   0.92 ( 2%) usr   0.00 ( 0%) sys   0.95 ( 2%) wall   
7315 kB ( 1%) ggc
 tree FRE  :   0.45 ( 1%) usr   0.04 ( 3%) sys   0.35 ( 1%) wall   
5518 kB ( 1%) ggc
 tree code sinking :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
1400 kB ( 0%) ggc
 tree linearize phis   :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree forward propagate:   0.18 ( 0%) usr   0.02 ( 1%) sys   0.16 ( 0%) wall  
10006 kB ( 1%) ggc
 tree conservative DCE :   0.05 ( 0%) usr   0.01 ( 1%) sys   0.13 ( 0%) wall   
 576 kB ( 0%) ggc
 tree aggressive DCE   :   0.28 ( 1%) usr   0.01 ( 1%) sys   0.37 ( 1%) wall   
8853 kB ( 1%) ggc
 tree buildin call DCE :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc
 tree DSE  :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
 132 kB ( 0%) ggc
 PHI merge :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
  37 kB ( 0%) ggc
 tree loop bounds  :   0.22 ( 0%) 

[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-29 Thread davidxl at gcc dot gnu dot org


--- Comment #20 from davidxl at gcc dot gnu dot org  2010-08-30 03:10 
---
(In reply to comment #16)
 adjust summary according to the last timings
 

I am surprised to see such big differences between trunk and previous releases.
Compiling this test case with the those options on my core2 box (2.4GHz ) took
only 56seconds which is comparable with the timing with a 4.4.3 compiler (with
google local patches including ivopt improvements).

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422



[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.

2010-08-29 Thread davidxl at gcc dot gnu dot org


--- Comment #21 from davidxl at gcc dot gnu dot org  2010-08-30 03:19 
---
(In reply to comment #17)
  tree iv optimization  :  32.57 (20%) usr   0.10 ( 5%) sys  32.73 (20%) wall 
 322095 kB (18%) ggc
 
 
 20% is still completely unreasonable for IV optimization.
 

There was a patch in trunk that may double the time in ivopt -- i.e.
find_optimal_iv_set_1 is done twice, one with the original iv set while the
other with full set. This probably needs to be revisited. 

David


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422