[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-06 Thread ubizjak at gmail dot com
--- Comment #21 from ubizjak at gmail dot com 2007-04-06 07:37 --- Strange things happen. I have fully removed gcc build directory and bootstrapped gcc from scratch. To my suprise, the difference with -msse and without -msse is now gone and optimized dumps are now the same. For

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread bonzini at gnu dot org
--- Comment #13 from bonzini at gnu dot org 2007-04-05 11:01 --- So this is an unstable sorting. Adding dnovillo. -- bonzini at gnu dot org changed: What|Removed |Added

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread ubizjak at gmail dot com
--- Comment #12 from ubizjak at gmail dot com 2007-04-05 11:00 --- (In reply to comment #11) with -msse compile flag. Note different variable suffixes that create different sort order. This is (IMO) due to fact that -msse enables lots of additional __builtin functions (these can

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread ubizjak at gmail dot com
--- Comment #11 from ubizjak at gmail dot com 2007-04-05 10:58 --- (In reply to comment #10) I would look at the lreg output, which contains the results of regclass. No, the difference is due to ssa pass that generates: # v1z_10 = PHI v1z_13(2), v1z_32(3) # v1y_9 = PHI v1y_12(2),

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread dnovillo at gcc dot gnu dot org
--- Comment #14 from dnovillo at gcc dot gnu dot org 2007-04-05 12:49 --- (In reply to comment #11) So, why does SSA pass have to interfere with computation dataflow? This interferece makes things worse and effectively takes away user's control on the flow of data. Huh? How

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread bonzini at gnu dot org
--- Comment #15 from bonzini at gnu dot org 2007-04-05 13:03 --- Transformations do not, but out-of-SSA could. Is there a way to ensure ordering of PHI functions unlike what Uros's dumps suggest? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread dnovillo at redhat dot com
--- Comment #16 from dnovillo at redhat dot com 2007-04-05 13:15 --- Subject: Re: Floating point computation far slower for -mfpmath=sse bonzini at gnu dot org wrote on 04/05/07 08:03: Is there a way to ensure ordering of PHI functions unlike what Uros's dumps suggest? No. I

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread amacleod at redhat dot com
--- Comment #17 from amacleod at redhat dot com 2007-04-05 14:23 --- Is the output from .optimized different? (once the ssa versions numbers have been stripped). Those PHIs should be irrelevant, the question is whether the different versioning has any effect. The only way I can

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread ubizjak at gmail dot com
--- Comment #18 from ubizjak at gmail dot com 2007-04-05 16:39 --- (In reply to comment #17) Is the output from .optimized different? (once the ssa versions numbers have been stripped). Those PHIs should be irrelevant, the question is whether the different versioning has any

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread amacleod at redhat dot com
--- Comment #19 from amacleod at redhat dot com 2007-04-05 17:24 --- what are you using for a compiler? Im using a mainline from mid march, and with it, my .optimized files diff exactly the same, and I get the aforementioned time differences in the executables. (sse.c and sse-bad.c are

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-05 Thread ubizjak at gmail dot com
--- Comment #20 from ubizjak at gmail dot com 2007-04-05 19:39 --- (In reply to comment #19) what are you using for a compiler? Im using a mainline from mid march, and gcc version 4.3.0 20070404 (experimental) on i686-pc-linux-gnu with it, my .optimized files diff exactly the same,

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-03 Thread bonzini at gnu dot org
--- Comment #8 from bonzini at gnu dot org 2007-04-03 12:43 --- what's the generated code for -ffast-math? in principle i don't see a reason why it should make any difference... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-03 Thread ubizjak at gmail dot com
--- Comment #9 from ubizjak at gmail dot com 2007-04-03 13:32 --- (In reply to comment #8) what's the generated code for -ffast-math? in principle i don't see a reason why it should make any difference... Trying to answer your question, I have played a bit with compile flags and

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2007-04-03 Thread bonzini at gnu dot org
--- Comment #10 from bonzini at gnu dot org 2007-04-03 13:36 --- I would look at the lreg output, which contains the results of regclass. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2006-10-25 Thread uros at kss-loka dot si
--- Comment #6 from uros at kss-loka dot si 2006-10-25 12:04 --- (In reply to comment #5) With more registers (x86_64) the stack moves are gone, but: (!) (testing done on AMD Athlon fam 15 model 35 stepping 2) On Xeon 3.6, SSE is now faster: gcc -O2 -march=pentium4 -mfpmath=387

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2006-10-25 Thread uros at kss-loka dot si
--- Comment #7 from uros at kss-loka dot si 2006-10-25 12:18 --- (In reply to comment #6) On Xeon 3.6, SSE is now faster: ... but for -ffast-math: SSE: user0m0.756s x87: user0m0.612s Yes, x87 is faster for -ffast-math by some 20%. --

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2006-10-24 Thread rguenth at gcc dot gnu dot org
--- Comment #5 from rguenth at gcc dot gnu dot org 2006-10-24 13:28 --- With more registers (x86_64) the stack moves are gone, but: (!) [EMAIL PROTECTED]:/abuild/rguenther/trunk-g/gcc ./xgcc -B. -O2 -o t t.c -mfpmath=387 [EMAIL PROTECTED]:/abuild/rguenther/trunk-g/gcc /usr/bin/time ./t

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2006-08-11 Thread bonzini at gnu dot org
--- Comment #4 from bonzini at gnu dot org 2006-08-11 10:22 --- Except that PPC uses 12 registers f0 f6 f7 f8 f9 f10 f11 f12 f13 f29 f30 f31. Not that we can blame GCC for using 12, but it is not a fair comparison. :-) In fact, 8 registers are enough, but it is quite tricky to obtain

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2005-09-28 Thread pinskia at gcc dot gnu dot org
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-29 04:06 --- Oh, and this looks very related to two operand instructions issue. PPC gives optimial code: L2: fmul f0,f6,f9 fmul f13,f7,f10 fmul f12,f8,f11 fmsub f29,f8,f10,f0

[Bug rtl-optimization/19780] Floating point computation far slower for -mfpmath=sse

2005-09-28 Thread pinskia at gcc dot gnu dot org
--- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-29 04:05 --- Confirmed. This is weird and this is an ra issue. I don't understand why the ra is spilling it to the stack as there are enough SSE registers to hold the 6 registers. -- What|Removed