[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-02-13 Thread bonzini at gnu dot org


--- Comment #44 from bonzini at gnu dot org  2009-02-13 16:05 ---
A simplified (local, noncascading) fwprop not using UD chains would not be hard
to do...  Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of walking
the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of
every basic block), and use that info instead of UD chains in
use_killed_between...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-02-13 Thread lucier at math dot purdue dot edu


--- Comment #45 from lucier at math dot purdue dot edu  2009-02-13 16:09 
---
Subject: Re:  [4.3/4.4 Regression] 30%
 performance slowdown in floating-point code caused by  r118475

On Fri, 2009-02-13 at 16:05 +, bonzini at gnu dot org wrote:
 --- Comment #44 from bonzini at gnu dot org  2009-02-13 16:05 ---
 A simplified (local, noncascading) fwprop not using UD chains would not be 
 hard
 to do...  Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of 
 walking
 the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of
 every basic block), and use that info instead of UD chains in
 use_killed_between...

As noted in comment 42, enabling FWPROP on this test case does not fix
the performance problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-02-13 Thread bonzini at gnu dot org


--- Comment #48 from bonzini at gnu dot org  2009-02-13 20:09 ---
Subject: Re:  [4.3/4.4 Regression] 30% 
performance slowdown in floating-point code caused by r118475

 Yes.  I don't see why the optimizations in CSE, which were relatively
 cheap and which were effective for this case, needed to be disabled when
 FWPROP was added without, evidently, understanding why FWPROP does not
 do what CSE was already doing.

Just to mention it, fwprop saved 3% of compile time.  That's not
cheap.  It was also tested with SPEC and Nullstone on several
architectures.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2009-01-24 Thread rguenth at gcc dot gnu dot org


--- Comment #43 from rguenth at gcc dot gnu dot org  2009-01-24 10:19 
---
GCC 4.3.3 is being released, adjusting target milestone.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|4.3.3   |4.3.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475

2008-12-07 Thread lucier at math dot purdue dot edu


--- Comment #42 from lucier at math dot purdue dot edu  2008-12-07 19:39 
---
Just a comment that -fforward-propagate isn't enabled at -O1 (the main
optimization option in the test) while the cse code it replaces was enabled at
-O1.  This is presumably why adding -fno-forward-propagate to the command line
in the test a year ago didn't affect the generated code.

Adding -fno-forward-propagate to the command line of the test case with
revision r118475 of gcc changes the generated code, but doesn't improve the
problem code in the main loop.

Updated the title to report the performance hit on

Intel(R) Xeon(R) CPU   X5460  @ 3.16GHz

as reported by /proc/cpuinfo


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

Summary|[4.3/4.4 Regression] 22%|[4.3/4.4 Regression] 30%
   |performance slowdown from   |performance slowdown in
   |4.2.2 to 4.3/4.4.0 in   |floating-point code caused
   |floating-point code |by  r118475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928