Re: [Haskell-cafe] Profiling nested case

2008-07-18 Thread Mitar
Hi!

On Sat, Jul 12, 2008 at 3:33 AM, Max Bolingbroke
[EMAIL PROTECTED] wrote:
 If findColor had been a function defined in terms of foldr rather than
 using explicit recursion, then theres a good chance GHC 6.9 would have
 fused it with the list to yield your optimized, loop unrolled,
 version:

My first version was with msum. Is this also covered by this fusion?

(And it is interesting that my own recursion version is faster than
the version with msum. Why?)

 Incidentally, if in your most recent email castRayScene2 was your only
 used of castRay, GHC would have inlined the whole definition into that
 use site and you would have got castRayScene1 back again.

It is a little more tricky. I choose in an IO monad which scene it
will render (selected by a program argument). So at compile time it
does not yet know which one it will use. But there is a finite number
of possibilities (in my case two) - why not inline both versions and
at run time choose one?


Mitar
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-18 Thread Chaddaï Fouché
2008/7/12 Mitar [EMAIL PROTECTED]:
 So that I can easily change the type everywhere. But it would be much
 nicer to write:

 data Quaternion a = Q !a !a !a !a deriving (Eq,Show)

 Only the performance of Num instance functions of Quaternion is then
 quite worse.

You can probably use a specialization pragma to get around that.

-- 
Jedaï
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-18 Thread Mitar
Hi!

On Fri, Jul 18, 2008 at 3:54 PM, Chaddaï Fouché
[EMAIL PROTECTED] wrote:
 So that I can easily change the type everywhere. But it would be much
 nicer to write:

 data Quaternion a = Q !a !a !a !a deriving (Eq,Show)

 Only the performance of Num instance functions of Quaternion is then
 quite worse.

 You can probably use a specialization pragma to get around that.

But why is this not automatic? If I use Quaternions of only one type
in the whole program then why it does not make specialized version for
it? At least with -O2 switch.

Why exactly are polymorphic functions slower? Is not this just a
question of type checking (and of writing portable/reusable code)? But
later on in a compiler process we do know of which type exactly is the
called function - so we could use a function as it would be written
only for that type. Something what specialization is doing as I see. I
thought this is always done.


Mitar
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-18 Thread Max Bolingbroke
2008/7/18 Mitar [EMAIL PROTECTED]:
 On Sat, Jul 12, 2008 at 3:33 AM, Max Bolingbroke
 [EMAIL PROTECTED] wrote:
 If findColor had been a function defined in terms of foldr rather than
 using explicit recursion, then theres a good chance GHC 6.9 would have
 fused it with the list to yield your optimized, loop unrolled,
 version:

 My first version was with msum. Is this also covered by this fusion?

Note that as I said the fusion only applies to 6.9 onwards. However,
assuming that you were using msum at the list monad then since msum =
concat is defined in terms of foldr there is a chance it could happen.
The best guide to this kind of thing is not asking me but rather
looking at the Core output by GHC for your particular program.

 (And it is interesting that my own recursion version is faster than
 the version with msum. Why?)

I don't know why you find this suprising :-). Your own version is
specialized exactly for the situation you wish to use it for. msum is
a generic combinator, which naturally makes it less amenable to
optimization because there is less information available about it's
usage pattern. Note that GHC will of course try to do its best to
de-specialize the msum function for any particular scenario it's used
in, through inlining etc, but there are no guarantees.

 It is a little more tricky. I choose in an IO monad which scene it
 will render (selected by a program argument). So at compile time it
 does not yet know which one it will use. But there is a finite number
 of possibilities (in my case two) - why not inline both versions and
 at run time choose one?

If there was only one static occurance of each identifier in your
module then it would do that. This is because there is no code size
implications for inlining a function into it's unitary use site.

Inlining is not a cure-all for performance though: if you inline too
much then you increase code size and hence increase the amount of main
memory you're reading and reduce instruction cache hits, not to
mention fill up your disk with multi-MB binaries.

Max
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-12 Thread Mitar
Hi!

(I will reply propely later, I have a project to finish and GHC is
playing me around and does not want to cooperate.)

This project of mine is getting really interesting. Is like playing
table tennis with GHC. Some time it gives a nice ball, sometimes I
have to run around after it. But I just wanted to make a simple
raycasting engine. Not debug GHC. But it is interesting - just I do
not have time just know for playing the change code - compile - run on
known input - check if time elapsed increased (I have to do this even
when I am thinking that I am optimizing things or even when I am
thinking that I am just refactoring code - moving constants to
definitions...). And this is a slow process because every iteration
runs for a few minutes.

The next beautiful example in this series is this function for
computing 4D Julia set fractal:

julia4DFractal :: BasicReal - World
julia4DFractal param (x,y,z) = julia4D (Q (x / scale) (y / scale) (z /
scale) param) iterations
  where c  = (Q (-0.08) 0.0 (-0.8) (-0.03))
alphaBlue  = VoxelColor 0 0 (2 / scale) (2 / scale)
scale  = fromIntegral sceneHeight / 1.8
threshold  = 16
iterations = 100 :: Int
julia4D _ 0= (# alphaBlue,
1 #) -- point is (probably) not in the set
julia4D q it | qMagnitudeSquared q  threshold = (# noColor, 1
#)   -- point is in the set
 | otherwise   = julia4D
(qSquared q + c) (it - 1)
  where distance  = scale * (qMagnitude qN) / (2 *
(qMagnitude qN')) * log (qMagnitude qN)
(# qN, qN' #) = disIter q (Q 1 0 0 0) iterations
  where disIter qn qn' 0
 = (# qn, qn' #)
disIter qn qn' i | qMagnitudeSquared qn 
threshold = (# qn, qn' #)
 | otherwise
 = disIter (qSquared qn + c) (2 * qn * qn') (i - 1)

Please observe that distance is never used. And this is also what GHC
warns. But the trick is that with having this part of a code in there,
the program virtually never finishes (I killed it after 15 minutes).
If I remove everything on and after the where distance line it
finishes in 30 seconds. OK, the problem is with (# qN, qN' #), if this
is changed to normal (qN, qN'), then it works. But to notice this ...
This is something you have to spend a day for.


Mitar
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-12 Thread Luke Palmer
On Sat, Jul 12, 2008 at 8:57 PM, Mitar [EMAIL PROTECTED] wrote:
 julia4DFractal :: BasicReal - World
 julia4DFractal param (x,y,z) = julia4D (Q (x / scale) (y / scale) (z /
 scale) param) iterations
  where c  = (Q (-0.08) 0.0 (-0.8) (-0.03))
alphaBlue  = VoxelColor 0 0 (2 / scale) (2 / scale)
scale  = fromIntegral sceneHeight / 1.8
threshold  = 16
iterations = 100 :: Int
julia4D _ 0= (# alphaBlue,
 1 #) -- point is (probably) not in the set
julia4D q it | qMagnitudeSquared q  threshold = (# noColor, 1
 #)   -- point is in the set
 | otherwise   = julia4D
 (qSquared q + c) (it - 1)
  where distance  = scale * (qMagnitude qN) / (2 *
 (qMagnitude qN')) * log (qMagnitude qN)
(# qN, qN' #) = disIter q (Q 1 0 0 0) iterations
  where disIter qn qn' 0
 = (# qn, qn' #)
disIter qn qn' i | qMagnitudeSquared qn 
 threshold = (# qn, qn' #)
 | otherwise
 = disIter (qSquared qn + c) (2 * qn * qn') (i - 1)

 Please observe that distance is never used. And this is also what GHC
 warns. But the trick is that with having this part of a code in there,
 the program virtually never finishes (I killed it after 15 minutes).
 If I remove everything on and after the where distance line it
 finishes in 30 seconds. OK, the problem is with (# qN, qN' #), if this
 is changed to normal (qN, qN'), then it works. But to notice this ...
 This is something you have to spend a day for.

My guess is that it was premature optimization that created this bug.
Unboxed tuples are not the best answer for every situation.  They are
evaluated strictly!  Which means:

unboxedBottom x
| False = (# 0, 0 #)
| otherwise = unboxedBottom x

let (# x, y #) = unboxedBottom 0 in 42

Is an infinite loop, not 42 as you would expect.  So when you write:

where  (# ... #) = something

You are requiring your program to evaluate 'something' regardless of
whether it is needed.

Unboxed tuples should be taken in the same vain as explicit strictness
annotations:  almost never use them, and let GHC do the work for you.
If you are in that phase where you are doing performance tweaks and
you think GHC's strictness analysis might not be picking up on some
strict behavior in your program, add the annotation.  If it makes it
faster, great; if it doesn't change things, take it out!  Best to
underconstrain your program.

But these days I try to make my programs fast by making the structure
of my program apparent to the compiler, not by forcing it to do things
in a certain way.  Admittedly making the structure of a program
apparent to the compiler is a rather subtle and brittle process.  I'm
sure people have at least brainstormed ways to help the compiler more.

Luke
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-12 Thread Mitar
Hi!

 My guess is that it was premature optimization that created this bug.

It is the root of all evil. ;-)

 Unboxed tuples are not the best answer for every situation.  They are
 evaluated strictly!

Then I have not understood the last paragraph correctly:

http://www.haskell.org/ghc/docs/latest/html/users_guide/primitives.html

Oh, no. It is like you say. I also use -funbox-strict-fields and Q is
defined with strict fields. But I tried also without the switch and is
it the same (it takes forever). But then qN and qN' does not have
unboxed types. So it should be lazy?

 If you are in that phase where you are doing performance tweaks and
 you think GHC's strictness analysis might not be picking up on some
 strict behavior in your program, add the annotation.  If it makes it
 faster, great; if it doesn't change things, take it out!  Best to
 underconstrain your program.

I completely agree. I am also a firm believer in the clean and pure
code where I would leave all optimization to compiler and just write
semantics into a program. But this project just showed me that there
is still a long way of compiler development before that would be
possible (and usable). That some simple refactoring of code which is
not really changing semantics have a big influence on a performance
because compiler uses it differently (polymorphic types instead of
hardcoded types, passing function as an parameter instead of hardcode
it).

For example I have now defined my types as:

type BasicReal = Double

data Quaternion = Q !BasicReal !BasicReal !BasicReal !BasicReal
deriving (Eq,Show)

So that I can easily change the type everywhere. But it would be much
nicer to write:

data Quaternion a = Q !a !a !a !a deriving (Eq,Show)

Only the performance of Num instance functions of Quaternion is then
quite worse.


Mitar
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-11 Thread Mitar
Hi!

This is not all. If I compare performance of those two semantically
same functions:

castRayScene1 :: Ray - ViewportDotColor
castRayScene1 (Ray vd o d) = ViewportDotColor vd (castRay' noColor 0)
  where castRay' color@(VoxelColor _ _ _ alpha) depth | depth 
depthOfField = color
  | alpha  0.001
  = castRay' pointColor (depth + distance')
  | alpha  0.999
  = color
  | otherwise
  = castRay' newColor (depth + distance')
  where (# pointColor, distance #) = worldScene (o + (d * depth))
distance'  = max 1 distance
newColor   = addColor color pointColor

and:

castRay :: World - Ray - ViewportDotColor
castRay world (Ray vd o d) = ViewportDotColor vd (castRay' noColor 0)
  where castRay' color@(VoxelColor _ _ _ alpha) depth | depth 
depthOfField = color
  | alpha  0.001
  = castRay' pointColor (depth + distance')
  | alpha  0.999
  = color
  | otherwise
  = castRay' newColor (depth + distance')
  where (# pointColor, distance #) = world (o + (d * depth))
distance'  = max 1 distance
newColor   = addColor color pointColor

castRayScene2 :: Ray - ViewportDotColor
castRayScene2 = castRay worldScene

is the program which uses castRayScene1 1.35 faster then the program
which uses castRayScene2 (37 seconds versus 50 seconds).

(Compiler with GHC 6.8.3 and -O2 switch. Program is executing almost
just this function over and over again.)

It is somehow award that passing function as an argument slow down the
program so much. Is not Haskell a functional language and this such
(functional) code reuse is one of its main points?

Of course. I could use some preprocessor/template engine to
change/find/replace castRay-like function into a castRayScene1 before
compiling. But this somehow kills the idea of a compiler? Smart
compiler. Which should do things for you?

The same as my previous example. Where a hard-coded list was not
optimized. (Like it would change during the execution.) It looks like
my program would be interpreted and not compiled.


Mitar
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-11 Thread Brandon S. Allbery KF8NH


On 2008 Jul 11, at 19:46, Mitar wrote:


It is somehow award that passing function as an argument slow down the
program so much. Is not Haskell a functional language and this such
(functional) code reuse is one of its main points?


That is in fact the case; GHC's version of various Prelude functions  
refactors them to avoid passing functional arguments.  IIRC the  
problem is that, while Haskell is indeed functional, passing a  
polymorphic function as an argument causes the runtime to have to look  
up which type is needed for every call, whereas if it's factored out  
it can be computed only once and (implicitly) let-bound.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-11 Thread Max Bolingbroke
 It is somehow award that passing function as an argument slow down the
 program so much. Is not Haskell a functional language and this such
 (functional) code reuse is one of its main points?

I can think of a few reasons why function passing is slow:
* There is an overhead to closure creation (I don't know how big this
is in practice, but it can be significant)
* GHC has little information about what function-typed variables mean,
because the function they are bound to is not known until runtime.
This precludes the use of inlining, rewrite rules etc, which are
absolutely key factors in making Haskell fast

Regarding your first example: currently GHC does not do loop
unrolling. It probably should though because loop unrolling reduces
braches and increases the amount of information about the execution
path that is available statically (as in e.g. the case liberation
transformation), which probably explains the increased performance you
saw by doing it manually in your first email.  Earlier this year I
implemented something somewhat similar though: fusable list literals.
In this example from your first email:

world :: SpacePoint - VoxelColor
world point = findColor [redSphere (0,50,0) 50, greenSphere (25,-50,0)
50, blueSphere (-150,0,150) 50]
 where findColor [] = noColor
   findColor (f:fs) = case f point of
   Just v  - v
   Nothing - findColor fs

If findColor had been a function defined in terms of foldr rather than
using explicit recursion, then theres a good chance GHC 6.9 would have
fused it with the list to yield your optimized, loop unrolled,
version:

world :: SpacePoint - VoxelColor
world point = case redSphere (0,50,0) 50 point of
   Just v  - v
   Nothing - case greenSphere (25,-50,0) 50 point of
Just v  - v
Nothing - case blueSphere (-150,0,150) 50 point of
 Just v  - v
 Nothing - noColor

Incidentally, if in your most recent email castRayScene2 was your only
used of castRay, GHC would have inlined the whole definition into that
use site and you would have got castRayScene1 back again.

So, GHC does try its best to make higher order functions fast :-). But
it's quite tricky!

All the best,
Max
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Profiling nested case

2008-07-10 Thread Justin Bailey
2008/7/9 Mitar [EMAIL PROTECTED]:


 And it took 15 s. And also the profiling was like I would anticipate.
 Calculating points coordinates and checking spheres takes almost all
 time.

 So any suggestions how could I build a list of objects to check at
 runtime and still have this third performance? Why this big
 difference?

I think the speed difference really comes from using a list to hold
the spheres in your first two examples, to referring to them directly
in case statements in your last example. Lists are going to introduce
indirections and therefore are slower than referring directly to the
values themselves.

Maybe you would have better luck using arrays? Template Haskell is
also an option - if you want to hard code your scene in another
module, TH can turn it into that kind of case statement. Of course, as
the scenes get more complex a series of nested cases isn't going to be
too effecient.

Justin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Profiling nested case

2008-07-09 Thread Mitar
Hi!

I am making a simple raycasting engine and have a function which take
a point in space and return a color of an object (if there is any) at
this point in space.

And because the whole thing is really slow (or was really slow) on
simple examples I decided to profile it. It takes around 60 seconds
for a 640x480 px image with 400 depth of field. This is at worst
122,880,000 calculations (if the scene is rather empty) of a
coordinate of a point in space and then checking for a color. And 60
seconds look really a lot to me for that.

So I went profiling and found out that the strange part of code is the
main color checking function which has a list of objects (at this time
the list is hardcoded). It looks like this:

world :: SpacePoint - VoxelColor
world point = case msum . sequence elements $ point of
Just v  - v
Nothing - noColor
  where elements = [redSphere (0,50,0) 50, greenSphere (25,-50,0) 50,
blueSphere (-150,0,150) 50]

So three spheres in a world and I check if the point is in any of
them. Like that:

sphere :: SpacePoint - BasicReal - VoxelColor - WorldElement --
center of sphere, it's radius, it's color
sphere (x0,y0,z0) r color (x,y,z)
  | x' * x' + y' * y' + z' * z' = r * r = Just color
  | otherwise= Nothing
where x' = x - x0
  y' = y - y0
  z' = z - z0

redSphere :: SpacePoint - BasicReal - WorldElement
redSphere c r = sphere c r redColor

So profiling told me that world function takes 38.4 % of all running
time. So I decided to play with it. Maybe a more direct approach would
be better:

world :: SpacePoint - VoxelColor
world point = findColor [redSphere (0,50,0) 50, greenSphere (25,-50,0)
50, blueSphere (-150,0,150) 50]
  where findColor [] = noColor
findColor (f:fs) = case f point of
Just v  - v
Nothing - findColor fs

Great, it improved. To 40 s. But still it was too much. I tried this:

world :: SpacePoint - VoxelColor
world point = case redSphere (0,50,0) 50 point of
Just v  - v
Nothing - case greenSphere (25,-50,0) 50 point of
 Just v  - v
 Nothing - case blueSphere (-150,0,150) 50 point of
  Just v  - v
  Nothing - noColor

And it took 15 s. And also the profiling was like I would anticipate.
Calculating points coordinates and checking spheres takes almost all
time.

So any suggestions how could I build a list of objects to check at
runtime and still have this third performance? Why this big
difference?

(I am using GHC 6.8.3 with -O2 compile switch.)

(The * operator is casting a ray, that is multiplying a ray direction
vector with a scalar factor.)


Mitar


Main-case.prof
Description: Binary data


Main-rec.prof
Description: Binary data


Main-seq.prof
Description: Binary data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe