hi, On Thu, Mar 6, 2014, at 2:23, Adam Conrad wrote: > I wouldn't be entirely against this option, if the performance hit is > measurably not awful in general purpose usage.
So I did some measurements against the 'radial-perf-test' in pixman, compiled with all of the special asm/mmx/sse2/etc. backends disabled (ie: plain C floating point code). I have no idea what this code is doing, but I figured it might be a good test. I might have accidentally picked something hideously non-representative. I only wanted to get a rough idea, without spending too much time on this. The baseline for 32bit with i686 march is "Average time to composite: 0.037647". Adding -fexcess-precision=standard gives 0.040273 (+ 7%). That's a reasonable hit on FP-heavy code. SSE2 beats -fexcess-precision but it doesn't really improve on the baseline -- in fact, -march=pentium4\ -mfpmath=sse\ -mtune=generic gives almost exactly the same result as where we are today: Average time to composite: 0.037669. The advantage here is that we now have a standards-compliant C compiler. We get a slight improvement if we turn on -march=pentium4\ -mtune=generic without forcing the compiler into SSE for math: Average time to composite: 0.036601. That's ~3% better than today. I'm slightly surprised that pentium4+sse2 only ties the existing -march=i686 flags (although it beats it by actually being standards-correct) and in particular I'm surprised that forcing SSE math slows things down vs. -march=pentium4 alone. I'm not sure the reason for this. It could be that the SSE2 instructions are truly a slower way of doing the math. It could also be that the compiler has received less optimisation attention here due to it being a non-default option. I did another test with a simple program that approximates the tight inner loop in a mandlebrot set calculation. It saw similar results in terms of i686 vs. pentium4 and sse (i686 ~= sse, plain pentium4 ~2% faster). In this case the performance hit of -fexcess-precision=standard was much worse, though: +40%. In short: I'm dismayed to report that turning on '-march=pentium4 -mfpmath=sse -mtune=generic' gives no performance improvement on this particular piece of code. If we approach this problem from the standpoint of "we must provide a C compiler that adheres to standards" then using these options does give a substantial improvement on fp-heavy code over the alternative of using -fexcess-precision=standard. Cheers -- ubuntu-devel mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
