[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML
Philip Armstrong wrote: On Sat, Jun 23, 2007 at 08:49:15AM +0100, Andrew Coppin wrote: Donald Bruce Stewart wrote: Don't use -O3 , its *worse* than -O2, and somewhere between -Onot and -O iirc, Is this likely to be fixed ever? There is at least a bug report for it IIRC. It was fixed yesterday. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML
Philip Armstrong wrote: On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote: On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote: That's the old wiki. The new one gives the opposite advice! (As does the ghc manual): http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html http://www.haskell.org/haskellwiki/Performance/Floating_Point Incidentally, the latter page implies that ghc is being overly pessimistic when compilling FP code without -fexcess-precision: On x86 (and other platforms with GHC prior to version 6.4.2), use the -fexcess-precision flag to improve performance of floating-point intensive code (up to 2x speedups have been seen). This will keep more intermediates in registers instead of memory, at the expense of occasional differences in results due to unpredictable rounding. IIRC, it is possible to issue an instruction to the x86 FP unit which makes all operations work on 64-bit Doubles, even though there are 80-bits available internally. Which then means there's no requirement to spill intermediate results to memory in order to get the rounding correct. For some background on why GHC doesn't do this, see the comment MORE FLOATING POINT MUSINGS... in http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs The main problem is floats: even if you put the FPU into 64-bit mode, your float operations will be done at 64-bit precision. There are other technical problems that we found with doing this, the comment above elaborates. GHC passes -ffloat-store to GCC, unless you give the flag -fexcess-precision. The idea is to try to get reproducible floating-point results. The native code generator is unaffected by -fexcess-precision, but it produces rubbish floating-point code on x86 anyway. Ideally, -fexcess-precision should just affect whether the FP unit uses 80 or 64 bit Doubles. It shouldn't make any performance difference, although obviously the generated results may be different. As an aside, if you use the -optc-mfpmath=sse option, then you only get 64-bit Doubles anyway (on x86). You probably want SSE2. If I ever get around to finishing it, the GHC native code generator will be able to generate SSE2 code on x86 someday, like it currently does for x86-64. For now, to get good FP performance on x86, you probably want -fvia-C -fexcess-precision -optc-mfpmath=sse2 Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML
Philip Armstrong wrote: On Thu, Jun 21, 2007 at 08:15:36PM +0200, peterv wrote: So float math in *slower* than double math in Haskell? That is interesting. Why is that? BTW, does Haskell support 80-bit long doubles? The Intel CPU seems to use that format internally. As I understand things, that is the effect of using -fexcess-precision. Obviously this means that the behaviour of your program can change with seemingly trivial code rearrangements, Not just code rearrangements: your program will give different results depending on the optimisation settings, whether you compile with -fvia-C or -fasm, and the results will be different from those on a machine using fixed 32-bit or 64-bit precision floating point operations. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML
On Fri, Jun 22, 2007 at 01:16:54PM +0100, Simon Marlow wrote: Philip Armstrong wrote: IIRC, it is possible to issue an instruction to the x86 FP unit which makes all operations work on 64-bit Doubles, even though there are 80-bits available internally. Which then means there's no requirement to spill intermediate results to memory in order to get the rounding correct. For some background on why GHC doesn't do this, see the comment MORE FLOATING POINT MUSINGS... in http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs Twisty. I guess 'slow, but correct, with switches to go faster at the price of correctness' is about the best option. You probably want SSE2. If I ever get around to finishing it, the GHC native code generator will be able to generate SSE2 code on x86 someday, like it currently does for x86-64. For now, to get good FP performance on x86, you probably want -fvia-C -fexcess-precision -optc-mfpmath=sse2 Reading the gcc manpage, I think you mean -optc-msse2 -optc-mfpmath=sse. -mfpmath=sse2 doesn't appear to be an option. (I note in passing that the ghc darcs head produces binaries from ray.hs which are about 15% slower than ghc 6.6.1 ones btw. Same optimisation options used both times.) cheers, Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML
On 22/06/07, Claus Reinke [EMAIL PROTECTED] wrote: perhaps this should be generalised to ghc flag profiles, to cover things like '-fno-monomorphism-restriction -fno-mono-pat-binds' or '-fglasgow-exts -fallow-undecidable-instances; and the like? You just *know* someone's gonna abuse that to make a genuine -funroll-loops, right? ;-) D. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML
-fvia-C -fexcess-precision -optc-mfpmath=sse2 is there, or should there be a way to define -O profiles for ghc? so that -O would refer to the standard profile, -Ofp would refer to the combination above as a floating point optiimisation profile, other profiles might include things like -funbox-strict-fields, and -Omy42 would refer to my own favourite combination of flags.. perhaps this should be generalised to ghc flag profiles, to cover things like '-fno-monomorphism-restriction -fno-mono-pat-binds' or '-fglasgow-exts -fallow-undecidable-instances; and the like? just a thought, claus ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe