[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #44 from bonzini at gnu dot org 2009-02-13 16:05 --- A simplified (local, noncascading) fwprop not using UD chains would not be hard to do... Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of walking the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of every basic block), and use that info instead of UD chains in use_killed_between... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #45 from lucier at math dot purdue dot edu 2009-02-13 16:09 --- Subject: Re: [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475 On Fri, 2009-02-13 at 16:05 +, bonzini at gnu dot org wrote: --- Comment #44 from bonzini at gnu dot org 2009-02-13 16:05 --- A simplified (local, noncascading) fwprop not using UD chains would not be hard to do... Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of walking the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of every basic block), and use that info instead of UD chains in use_killed_between... As noted in comment 42, enabling FWPROP on this test case does not fix the performance problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #48 from bonzini at gnu dot org 2009-02-13 20:09 --- Subject: Re: [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475 Yes. I don't see why the optimizations in CSE, which were relatively cheap and which were effective for this case, needed to be disabled when FWPROP was added without, evidently, understanding why FWPROP does not do what CSE was already doing. Just to mention it, fwprop saved 3% of compile time. That's not cheap. It was also tested with SPEC and Nullstone on several architectures. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #43 from rguenth at gcc dot gnu dot org 2009-01-24 10:19 --- GCC 4.3.3 is being released, adjusting target milestone. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.3.3 |4.3.4 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
--- Comment #42 from lucier at math dot purdue dot edu 2008-12-07 19:39 --- Just a comment that -fforward-propagate isn't enabled at -O1 (the main optimization option in the test) while the cse code it replaces was enabled at -O1. This is presumably why adding -fno-forward-propagate to the command line in the test a year ago didn't affect the generated code. Adding -fno-forward-propagate to the command line of the test case with revision r118475 of gcc changes the generated code, but doesn't improve the problem code in the main loop. Updated the title to report the performance hit on Intel(R) Xeon(R) CPU X5460 @ 3.16GHz as reported by /proc/cpuinfo -- lucier at math dot purdue dot edu changed: What|Removed |Added Summary|[4.3/4.4 Regression] 22%|[4.3/4.4 Regression] 30% |performance slowdown from |performance slowdown in |4.2.2 to 4.3/4.4.0 in |floating-point code caused |floating-point code |by r118475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928