--- Comment #112 from bergner at gcc dot gnu dot org 2009-10-03 01:39
---
Subject: Bug 33928
Author: bergner
Date: Sat Oct 3 01:39:14 2009
New Revision: 152430
URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=152430
Log:
Backport from mainline.
2009-08-30 Alan
--- Comment #111 from lucier at math dot purdue dot edu 2009-08-27 17:02
---
I can compile gambit 4.1.2 with -fschedule-insns except for the function noted
in PR41164.
On
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz
with
gcc version 4.5.0 20090803
--- Comment #108 from lucier at math dot purdue dot edu 2009-08-27 01:18
---
direct.c contains a direct FFT; I've compiled the direct and inverse fft and I
ran it on arrays with 2^23 double-precision complex elements and
heine:~/programs/gcc/objdirs/bench-mainline-on-fft
--- Comment #109 from lucier at math dot purdue dot edu 2009-08-27 01:22
---
Created an attachment (id=18432)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18432action=view)
inner loop of direct.c with -fschedule-insns
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #110 from lucier at math dot purdue dot edu 2009-08-27 01:22
---
Created an attachment (id=18433)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18433action=view)
inner loop of direct.c without -fschedule-insns
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #107 from rguenth at gcc dot gnu dot org 2009-08-04 12:28
---
GCC 4.3.4 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What|Removed |Added
--- Comment #104 from bonzini at gnu dot org 2009-06-16 06:47 ---
I understood that with -frename-registers the regression is fixed. As I said,
without a pre-regalloc scheduling pass and without register renaming, the
scheduling quality you get is more or less random.
--
--- Comment #105 from bonzini at gnu dot org 2009-06-16 07:01 ---
Marking PR39157 as a duplicate of PR26854 is not exact (only the fwprop part is
a duplicate, because we were getting large compile times because of building
large data structures; the CFG Cleanup part is not exactly a
--- Comment #106 from lucier at math dot purdue dot edu 2009-06-16 07:24
---
This machine has 4ms ticks, so we're getting down to a few ticks difference
with a benchmark of this size. It's 156ms with 4.2.4, 168ms with 4.5.0, and
164 ms when -frename-registers is added to the command
--- Comment #97 from bonzini at gnu dot org 2009-06-15 15:14 ---
Brad, could you try to time compiler.i with and without -ftime-report to see
how much of the tree stmt walking timevar is just accounting overhead?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #98 from lucier at math dot purdue dot edu 2009-06-15 16:11
---
I don't quite understand how you would like me to configure and run the test.
First, I've applied your patches to speed up computing DF to my tree; do you
want them included in the test, or should I use a
--- Comment #99 from paolo dot bonzini at gmail dot com 2009-06-15 16:20
---
Subject: Re: [4.3/4.4/4.5 Regression] 30% performance
slowdown in floating-point code caused by r118475
First, I've applied your patches to speed up computing DF to my tree; do you
want them included in
--- Comment #103 from lucier at math dot purdue dot edu 2009-06-15 20:21
---
Regarding comment #101 ...
With
heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2
/pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with:
--- Comment #95 from lucier at math dot purdue dot edu 2009-06-14 14:59
---
The test case is compiler.i.gz
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #96 from lucier at math dot purdue dot edu 2009-06-14 15:02
---
Sorry, the gcc options are in comment 87 (the -fforward-propagate is now
redundant), and without Paolo's recently proposed patch it requires about 9GB
of memory to compile.
--
--- Comment #93 from rguenth at gcc dot gnu dot org 2009-06-13 14:18
---
I would say that was the new SRA.
--
rguenth at gcc dot gnu dot org changed:
What|Removed |Added
--- Comment #94 from jamborm at gcc dot gnu dot org 2009-06-14 04:43
---
(In reply to comment #92)
In the meanwhile something caused tree incremental SSA to jump up from 10s
to
26s. Sob.
(In reply to comment #93)
I would say that was the new SRA.
OK, I'll try to
--- Comment #92 from bonzini at gnu dot org 2009-06-12 14:50 ---
In the meanwhile something caused tree incremental SSA to jump up from 10s to
26s. Sob.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #88 from bonzini at gnu dot org 2009-06-08 08:40 ---
Created an attachment (id=17963)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17963action=view)
patch I'm testing
Here is a patch I'm testing that completes the rewrite of fwprop's dataflow.
This should make it
--- Comment #89 from bonzini at gnu dot org 2009-06-08 08:59 ---
Created an attachment (id=17964)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17964action=view)
correct version
oops, the previous one didn't work at -O1 even though it bootstrapped :-)
--
bonzini at gnu dot org
--- Comment #90 from bonzini at gnu dot org 2009-06-08 16:35 ---
Yo, with the patch the time to compile compiler.i with the given options is
331s on my machine (with a checking compiler). Fwprop takes only 1% (including
computation of the new dataflow problem). I'd estimate around
--- Comment #91 from lucier at math dot purdue dot edu 2009-06-08 18:19
---
Created an attachment (id=17968)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17968action=view)
time and memory report for compiler.i after Paolo's patch
The patch cut the total bitmaps used compiling
--- Comment #84 from bonzini at gnu dot org 2009-05-15 10:35 ---
Ok, I am working on a patch to add a multiple-definitions DF problem and use
that together with a domwalk to find the single definitions (instead of
reaching-definitions, which is the remaining slow part). The new problem
--- Comment #85 from lucier at math dot purdue dot edu 2009-05-16 00:20
---
Created an attachment (id=17878)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17878action=view)
Large test file for testing time and memory usage
This is the file compiler.i used in the previous tests.
--- Comment #78 from bonzini at gnu dot org 2009-05-08 06:51 ---
Subject: Bug 33928
Author: bonzini
Date: Fri May 8 06:51:12 2009
New Revision: 147270
URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=147270
Log:
2009-05-08 Paolo Bonzini bonz...@gnu.org
PR
--- Comment #79 from bonzini at gnu dot org 2009-05-08 07:18 ---
I'm cobbling up the DIY dataflow patch and it is all but ugly, actually.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #80 from bonzini at gnu dot org 2009-05-08 07:51 ---
Subject: Bug 33928
Author: bonzini
Date: Fri May 8 07:51:46 2009
New Revision: 147274
URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=147274
Log:
2009-05-08 Paolo Bonzini bonz...@gnu.org
PR
--- Comment #81 from bonzini at gnu dot org 2009-05-08 07:55 ---
Created an attachment (id=17825)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17825action=view)
speed up fwprop and enable it at -O1
Here is a patch I'm bootstrapping to remove fwprop's usage of UD chains. It
does
--- Comment #82 from bonzini at gnu dot org 2009-05-08 09:41 ---
Hm, looking at the time reports the patch will save about 30-40% of the fwprop
execution time, and should fix the memory hog problem, but will still leave in
the 70s needed to compute reaching definitions. I guess it's a
--- Comment #83 from bonzini at gnu dot org 2009-05-08 12:22 ---
Subject: Bug 33928
Author: bonzini
Date: Fri May 8 12:22:30 2009
New Revision: 147282
URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=147282
Log:
2009-05-08 Paolo Bonzini bonz...@gnu.org
PR
--- Comment #67 from bonzini at gnu dot org 2009-05-07 13:40 ---
I'm thinking of enabling -frename-registers on x86; since it does not enable
the first scheduling pass, the live ranges will be shorter and the register
allocator may reuse the same register over and over with no freedom
--- Comment #71 from lucier at math dot purdue dot edu 2009-05-07 16:02
---
Created an attachment (id=17820)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17820action=view)
time for 31957, with rename-registers
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #74 from bonzini at gnu dot org 2009-05-07 16:21 ---
Ok. One step at a time. :-) To recap, here is the situation:
- the CSE optimization you mention was *not* removed, it was moved to fwprop,
so it does not run at -O1.
- once this was done, the way to go is to tune new
--- Comment #75 from lucier at math dot purdue dot edu 2009-05-07 16:31
---
Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in
floating-point code caused by r118475
On May 7, 2009, at 12:21 PM, bonzini at gnu dot org wrote:
--- Comment #74 from bonzini at gnu
--- Comment #76 from bonzini at gnu dot org 2009-05-07 16:37 ---
It should be possible to modify fwprop to avoid excessive memory usage (doing
its own dataflow, basically, instead of using UD chains)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #77 from steven at gcc dot gnu dot org 2009-05-07 17:50 ---
Re. comment #75: Just the fact that an option is enabled in both releases
doesn't mean the pass behind it is doing the same thing in both releases. What
the scheduler does, depends heavily on the code you feed it.
--- Comment #61 from jakub at gcc dot gnu dot org 2009-05-06 13:05 ---
Also see PR39871, maybe that's related (though on ARM).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #62 from bonzini at gnu dot org 2009-05-06 15:07 ---
No, totally unrelated to PR39871
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #63 from lucier at math dot purdue dot edu 2009-05-06 19:57
---
Was the patch in comment 55 meant for me to bootstrap and test with today's
mainline? It crashes at the gcc_assert at
/* Subroutine of canon_reg. Pass *XLOC through canon_reg, and validate
the result if
--- Comment #64 from lucier at math dot purdue dot edu 2009-05-06 20:43
---
In answer to comment 60, here's the command line where I added
-fforward-propagate -fno-move-loop-invariants:
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno
--- Comment #65 from bonzini at gnu dot org 2009-05-07 05:03 ---
Subject: Re: [4.3/4.4/4.5 Regression] 30% performance
slowdown in floating-point code caused by r118475
lucier at math dot purdue dot edu wrote:
--- Comment #64 from lucier at math dot purdue dot edu 2009-05-06
--- Comment #66 from lucier at math dot purdue dot edu 2009-05-07 05:27
---
Adding -frename-registers gives a significant speedup (sometimes as fast as
4.1.2 on this shared machine, i.e., it somtimes hits 108 ms instead of
132-140ms), the command line with -fforward-propagate
42 matches
Mail list logo