[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-10-02 Thread bergner at gcc dot gnu dot org
--- Comment #112 from bergner at gcc dot gnu dot org 2009-10-03 01:39 --- Subject: Bug 33928 Author: bergner Date: Sat Oct 3 01:39:14 2009 New Revision: 152430 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=152430 Log: Backport from mainline. 2009-08-30 Alan

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-27 Thread lucier at math dot purdue dot edu
--- Comment #111 from lucier at math dot purdue dot edu 2009-08-27 17:02 --- I can compile gambit 4.1.2 with -fschedule-insns except for the function noted in PR41164. On model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz with gcc version 4.5.0 20090803

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-26 Thread lucier at math dot purdue dot edu
--- Comment #108 from lucier at math dot purdue dot edu 2009-08-27 01:18 --- direct.c contains a direct FFT; I've compiled the direct and inverse fft and I ran it on arrays with 2^23 double-precision complex elements and heine:~/programs/gcc/objdirs/bench-mainline-on-fft

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-26 Thread lucier at math dot purdue dot edu
--- Comment #109 from lucier at math dot purdue dot edu 2009-08-27 01:22 --- Created an attachment (id=18432) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18432action=view) inner loop of direct.c with -fschedule-insns -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-26 Thread lucier at math dot purdue dot edu
--- Comment #110 from lucier at math dot purdue dot edu 2009-08-27 01:22 --- Created an attachment (id=18433) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18433action=view) inner loop of direct.c without -fschedule-insns -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-08-04 Thread rguenth at gcc dot gnu dot org
--- Comment #107 from rguenth at gcc dot gnu dot org 2009-08-04 12:28 --- GCC 4.3.4 is being released, adjusting target milestone. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-16 Thread bonzini at gnu dot org
--- Comment #104 from bonzini at gnu dot org 2009-06-16 06:47 --- I understood that with -frename-registers the regression is fixed. As I said, without a pre-regalloc scheduling pass and without register renaming, the scheduling quality you get is more or less random. --

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-16 Thread bonzini at gnu dot org
--- Comment #105 from bonzini at gnu dot org 2009-06-16 07:01 --- Marking PR39157 as a duplicate of PR26854 is not exact (only the fwprop part is a duplicate, because we were getting large compile times because of building large data structures; the CFG Cleanup part is not exactly a

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-16 Thread lucier at math dot purdue dot edu
--- Comment #106 from lucier at math dot purdue dot edu 2009-06-16 07:24 --- This machine has 4ms ticks, so we're getting down to a few ticks difference with a benchmark of this size. It's 156ms with 4.2.4, 168ms with 4.5.0, and 164 ms when -frename-registers is added to the command

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread bonzini at gnu dot org
--- Comment #97 from bonzini at gnu dot org 2009-06-15 15:14 --- Brad, could you try to time compiler.i with and without -ftime-report to see how much of the tree stmt walking timevar is just accounting overhead? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread lucier at math dot purdue dot edu
--- Comment #98 from lucier at math dot purdue dot edu 2009-06-15 16:11 --- I don't quite understand how you would like me to configure and run the test. First, I've applied your patches to speed up computing DF to my tree; do you want them included in the test, or should I use a

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread paolo dot bonzini at gmail dot com
--- Comment #99 from paolo dot bonzini at gmail dot com 2009-06-15 16:20 --- Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475 First, I've applied your patches to speed up computing DF to my tree; do you want them included in

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-15 Thread lucier at math dot purdue dot edu
--- Comment #103 from lucier at math dot purdue dot edu 2009-06-15 20:21 --- Regarding comment #101 ... With heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2 /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with:

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-14 Thread lucier at math dot purdue dot edu
--- Comment #95 from lucier at math dot purdue dot edu 2009-06-14 14:59 --- The test case is compiler.i.gz -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-14 Thread lucier at math dot purdue dot edu
--- Comment #96 from lucier at math dot purdue dot edu 2009-06-14 15:02 --- Sorry, the gcc options are in comment 87 (the -fforward-propagate is now redundant), and without Paolo's recently proposed patch it requires about 9GB of memory to compile. --

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-13 Thread rguenth at gcc dot gnu dot org
--- Comment #93 from rguenth at gcc dot gnu dot org 2009-06-13 14:18 --- I would say that was the new SRA. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-13 Thread jamborm at gcc dot gnu dot org
--- Comment #94 from jamborm at gcc dot gnu dot org 2009-06-14 04:43 --- (In reply to comment #92) In the meanwhile something caused tree incremental SSA to jump up from 10s to 26s. Sob. (In reply to comment #93) I would say that was the new SRA. OK, I'll try to

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-12 Thread bonzini at gnu dot org
--- Comment #92 from bonzini at gnu dot org 2009-06-12 14:50 --- In the meanwhile something caused tree incremental SSA to jump up from 10s to 26s. Sob. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-08 Thread bonzini at gnu dot org
--- Comment #88 from bonzini at gnu dot org 2009-06-08 08:40 --- Created an attachment (id=17963) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17963action=view) patch I'm testing Here is a patch I'm testing that completes the rewrite of fwprop's dataflow. This should make it

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-08 Thread bonzini at gnu dot org
--- Comment #89 from bonzini at gnu dot org 2009-06-08 08:59 --- Created an attachment (id=17964) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17964action=view) correct version oops, the previous one didn't work at -O1 even though it bootstrapped :-) -- bonzini at gnu dot org

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-08 Thread bonzini at gnu dot org
--- Comment #90 from bonzini at gnu dot org 2009-06-08 16:35 --- Yo, with the patch the time to compile compiler.i with the given options is 331s on my machine (with a checking compiler). Fwprop takes only 1% (including computation of the new dataflow problem). I'd estimate around

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-06-08 Thread lucier at math dot purdue dot edu
--- Comment #91 from lucier at math dot purdue dot edu 2009-06-08 18:19 --- Created an attachment (id=17968) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17968action=view) time and memory report for compiler.i after Paolo's patch The patch cut the total bitmaps used compiling

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-15 Thread bonzini at gnu dot org
--- Comment #84 from bonzini at gnu dot org 2009-05-15 10:35 --- Ok, I am working on a patch to add a multiple-definitions DF problem and use that together with a domwalk to find the single definitions (instead of reaching-definitions, which is the remaining slow part). The new problem

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-15 Thread lucier at math dot purdue dot edu
--- Comment #85 from lucier at math dot purdue dot edu 2009-05-16 00:20 --- Created an attachment (id=17878) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17878action=view) Large test file for testing time and memory usage This is the file compiler.i used in the previous tests.

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-08 Thread bonzini at gcc dot gnu dot org
--- Comment #78 from bonzini at gnu dot org 2009-05-08 06:51 --- Subject: Bug 33928 Author: bonzini Date: Fri May 8 06:51:12 2009 New Revision: 147270 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=147270 Log: 2009-05-08 Paolo Bonzini bonz...@gnu.org PR

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-08 Thread bonzini at gnu dot org
--- Comment #79 from bonzini at gnu dot org 2009-05-08 07:18 --- I'm cobbling up the DIY dataflow patch and it is all but ugly, actually. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-08 Thread bonzini at gcc dot gnu dot org
--- Comment #80 from bonzini at gnu dot org 2009-05-08 07:51 --- Subject: Bug 33928 Author: bonzini Date: Fri May 8 07:51:46 2009 New Revision: 147274 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=147274 Log: 2009-05-08 Paolo Bonzini bonz...@gnu.org PR

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-08 Thread bonzini at gnu dot org
--- Comment #81 from bonzini at gnu dot org 2009-05-08 07:55 --- Created an attachment (id=17825) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17825action=view) speed up fwprop and enable it at -O1 Here is a patch I'm bootstrapping to remove fwprop's usage of UD chains. It does

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-08 Thread bonzini at gnu dot org
--- Comment #82 from bonzini at gnu dot org 2009-05-08 09:41 --- Hm, looking at the time reports the patch will save about 30-40% of the fwprop execution time, and should fix the memory hog problem, but will still leave in the 70s needed to compute reaching definitions. I guess it's a

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-08 Thread bonzini at gcc dot gnu dot org
--- Comment #83 from bonzini at gnu dot org 2009-05-08 12:22 --- Subject: Bug 33928 Author: bonzini Date: Fri May 8 12:22:30 2009 New Revision: 147282 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=147282 Log: 2009-05-08 Paolo Bonzini bonz...@gnu.org PR

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread bonzini at gnu dot org
--- Comment #67 from bonzini at gnu dot org 2009-05-07 13:40 --- I'm thinking of enabling -frename-registers on x86; since it does not enable the first scheduling pass, the live ranges will be shorter and the register allocator may reuse the same register over and over with no freedom

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread lucier at math dot purdue dot edu
--- Comment #71 from lucier at math dot purdue dot edu 2009-05-07 16:02 --- Created an attachment (id=17820) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17820action=view) time for 31957, with rename-registers -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread bonzini at gnu dot org
--- Comment #74 from bonzini at gnu dot org 2009-05-07 16:21 --- Ok. One step at a time. :-) To recap, here is the situation: - the CSE optimization you mention was *not* removed, it was moved to fwprop, so it does not run at -O1. - once this was done, the way to go is to tune new

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread lucier at math dot purdue dot edu
--- Comment #75 from lucier at math dot purdue dot edu 2009-05-07 16:31 --- Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475 On May 7, 2009, at 12:21 PM, bonzini at gnu dot org wrote: --- Comment #74 from bonzini at gnu

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread bonzini at gnu dot org
--- Comment #76 from bonzini at gnu dot org 2009-05-07 16:37 --- It should be possible to modify fwprop to avoid excessive memory usage (doing its own dataflow, basically, instead of using UD chains) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-07 Thread steven at gcc dot gnu dot org
--- Comment #77 from steven at gcc dot gnu dot org 2009-05-07 17:50 --- Re. comment #75: Just the fact that an option is enabled in both releases doesn't mean the pass behind it is doing the same thing in both releases. What the scheduler does, depends heavily on the code you feed it.

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread jakub at gcc dot gnu dot org
--- Comment #61 from jakub at gcc dot gnu dot org 2009-05-06 13:05 --- Also see PR39871, maybe that's related (though on ARM). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread bonzini at gnu dot org
--- Comment #62 from bonzini at gnu dot org 2009-05-06 15:07 --- No, totally unrelated to PR39871 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread lucier at math dot purdue dot edu
--- Comment #63 from lucier at math dot purdue dot edu 2009-05-06 19:57 --- Was the patch in comment 55 meant for me to bootstrap and test with today's mainline? It crashes at the gcc_assert at /* Subroutine of canon_reg. Pass *XLOC through canon_reg, and validate the result if

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread lucier at math dot purdue dot edu
--- Comment #64 from lucier at math dot purdue dot edu 2009-05-06 20:43 --- In answer to comment 60, here's the command line where I added -fforward-propagate -fno-move-loop-invariants: /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused -O1 -fno-math-errno

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread bonzini at gnu dot org
--- Comment #65 from bonzini at gnu dot org 2009-05-07 05:03 --- Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475 lucier at math dot purdue dot edu wrote: --- Comment #64 from lucier at math dot purdue dot edu 2009-05-06

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-05-06 Thread lucier at math dot purdue dot edu
--- Comment #66 from lucier at math dot purdue dot edu 2009-05-07 05:27 --- Adding -frename-registers gives a significant speedup (sometimes as fast as 4.1.2 on this shared machine, i.e., it somtimes hits 108 ms instead of 132-140ms), the command line with -fforward-propagate