subject:"\[Haskell\-cafe\] Re\: Haskell version of ray tracer code is much slower than the original ML"

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

2007-06-26 Thread Simon Marlow


Philip Armstrong wrote:

On Sat, Jun 23, 2007 at 08:49:15AM +0100, Andrew Coppin wrote:

Donald Bruce Stewart wrote:
Don't use -O3 , its *worse* than -O2, and somewhere between -Onot and 
-O iirc,


Is this likely to be fixed ever?


There is at least a bug report for it IIRC.


It was fixed yesterday.

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

2007-06-22 Thread Simon Marlow


Philip Armstrong wrote:

On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote:

On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:



That's the old wiki. The new one gives the opposite advice! (As does
the ghc manual):

 http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
 http://www.haskell.org/haskellwiki/Performance/Floating_Point


Incidentally, the latter page implies that ghc is being overly
pessimistic when compilling FP code without -fexcess-precision:

On x86 (and other platforms with GHC prior to version 6.4.2), use
 the -fexcess-precision flag to improve performance of floating-point
 intensive code (up to 2x speedups have been seen). This will keep
 more intermediates in registers instead of memory, at the expense of
 occasional differences in results due to unpredictable rounding.

IIRC, it is possible to issue an instruction to the x86 FP unit which
makes all operations work on 64-bit Doubles, even though there are
80-bits available internally. Which then means there's no requirement
to spill intermediate results to memory in order to get the rounding
correct.


For some background on why GHC doesn't do this, see the comment MORE FLOATING 
POINT MUSINGS... in


  http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs

The main problem is floats: even if you put the FPU into 64-bit mode, your float 
operations will be done at 64-bit precision.  There are other technical problems 
that we found with doing this, the comment above elaborates.


GHC passes -ffloat-store to GCC, unless you give the flag -fexcess-precision. 
The idea is to try to get reproducible floating-point results.  The native code 
generator is unaffected by -fexcess-precision, but it produces rubbish 
floating-point code on x86 anyway.



Ideally, -fexcess-precision should just affect whether the FP unit
uses 80 or 64 bit Doubles. It shouldn't make any performance
difference, although obviously the generated results may be different.



As an aside, if you use the -optc-mfpmath=sse option, then you only
get 64-bit Doubles anyway (on x86).


You probably want SSE2.  If I ever get around to finishing it, the GHC native 
code generator will be able to generate SSE2 code on x86 someday, like it 
currently does for x86-64.  For now, to get good FP performance on x86, you 
probably want


  -fvia-C -fexcess-precision -optc-mfpmath=sse2

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

2007-06-22 Thread Simon Marlow


Philip Armstrong wrote:

On Thu, Jun 21, 2007 at 08:15:36PM +0200, peterv wrote:
So float math in *slower* than double math in Haskell? That is 
interesting.
Why is that?   

BTW, does Haskell support 80-bit long doubles? The Intel CPU seems 
to use

that format internally.


As I understand things, that is the effect of using -fexcess-precision.

Obviously this means that the behaviour of your program can change
with seemingly trivial code rearrangements,


Not just code rearrangements: your program will give different results depending 
on the optimisation settings, whether you compile with -fvia-C or -fasm, and the 
results will be different from those on a machine using fixed 32-bit or 64-bit 
precision floating point operations.


Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

2007-06-22 Thread Philip Armstrong


On Fri, Jun 22, 2007 at 01:16:54PM +0100, Simon Marlow wrote:

Philip Armstrong wrote:

IIRC, it is possible to issue an instruction to the x86 FP unit which
makes all operations work on 64-bit Doubles, even though there are
80-bits available internally. Which then means there's no requirement
to spill intermediate results to memory in order to get the rounding
correct.


For some background on why GHC doesn't do this, see the comment MORE 
FLOATING POINT MUSINGS... in


  http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs


Twisty. I guess 'slow, but correct, with switches to go faster at the
price of correctness' is about the best option.

You probably want SSE2.  If I ever get around to finishing it, the GHC 
native code generator will be able to generate SSE2 code on x86 someday, 
like it currently does for x86-64.  For now, to get good FP performance on 
x86, you probably want


  -fvia-C -fexcess-precision -optc-mfpmath=sse2


Reading the gcc manpage, I think you mean -optc-msse2
-optc-mfpmath=sse. -mfpmath=sse2 doesn't appear to be an option.

(I note in passing that the ghc darcs head produces binaries from
ray.hs which are about 15% slower than ghc 6.6.1 ones btw. Same
optimisation options used both times.)

cheers, Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

2007-06-22 Thread Dougal Stanton


On 22/06/07, Claus Reinke [EMAIL PROTECTED] wrote:


perhaps this should be generalised to ghc flag profiles, to cover
things like '-fno-monomorphism-restriction -fno-mono-pat-binds'
or '-fglasgow-exts -fallow-undecidable-instances; and the like?


You just *know* someone's gonna abuse that to make a genuine
-funroll-loops, right? ;-)

D.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

2007-06-22 Thread Claus Reinke


  -fvia-C -fexcess-precision -optc-mfpmath=sse2


is there, or should there be a way to define -O profiles for ghc?
so that -O would refer to the standard profile, -Ofp would refer
to the combination above as a floating point optiimisation profile,
other profiles might include things like -funbox-strict-fields, and
-Omy42 would refer to my own favourite combination of flags..

perhaps this should be generalised to ghc flag profiles, to cover
things like '-fno-monomorphism-restriction -fno-mono-pat-binds'
or '-fglasgow-exts -fallow-undecidable-instances; and the like?

just a thought,
claus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

Re: [Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

Re: [Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

6 matches

Site Navigation

Mail list logo

Footer information