[Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Roman Cheplyaka
I'm trying to optimize the following program:
http://github.com/feuerbach/particles/blob/303c8a17c9b732e22457b5409bdce4b7520be94a/run.hs

Of course general suggestions are welcome (BTW I'm going to give a try
to vector), but currently I'm concerned with two questions:

1. Heavy allocations in 'distance' function. Here is (part of) the profile:

COST CENTRE   MODULE%time %alloc  ticks bytes

d2Main9.0   22.0290 6
d Main8.6   65.9278 18
d1Main7.5   11.0242 29970

From reading core I got the impression that everything is strict 
unboxed. Perhaps this is related to creating some closures? How to get
rid of those allocations?

2. Again from reading the core I learned that although 'l' and other
constants are inlined, their type is boxed Double. This makes sense
since CAFs are evaluated on demand, but obviously in this particular
case it does not make sense, so can I somehow make them unboxed?

-- 
Roman I. Cheplyaka :: http://ro-che.info/
Don't let school get in the way of your education. - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Roman Cheplyaka
* Roman Cheplyaka r...@ro-che.info [2010-06-17 12:40:59+0300]
 I'm trying to optimize the following program:
 http://github.com/feuerbach/particles/blob/303c8a17c9b732e22457b5409bdce4b7520be94a/run.hs
 
 Of course general suggestions are welcome (BTW I'm going to give a try
 to vector), but currently I'm concerned with two questions:
 
 1. Heavy allocations in 'distance' function. Here is (part of) the profile:
 
 COST CENTRE   MODULE%time %alloc  ticks bytes
 
 d2Main9.0   22.0290 6
 d Main8.6   65.9278 18
 d1Main7.5   11.0242 29970
 
 From reading core I got the impression that everything is strict 
 unboxed. Perhaps this is related to creating some closures? How to get
 rid of those allocations?
 
 2. Again from reading the core I learned that although 'l' and other
 constants are inlined, their type is boxed Double. This makes sense
 since CAFs are evaluated on demand, but obviously in this particular
 case it does not make sense, so can I somehow make them unboxed?

Forgot to mention, I'm using ghc 6.12.1, compiling with -O2.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
Don't let school get in the way of your education. - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Daniel Fischer
On Thursday 17 June 2010 11:43:09, Roman Cheplyaka wrote:
 * Roman Cheplyaka r...@ro-che.info [2010-06-17 12:40:59+0300]

  I'm trying to optimize the following program:
  http://github.com/feuerbach/particles/blob/303c8a17c9b732e22457b5409bd
 ce4b7520be94a/run.hs
 
  Of course general suggestions are welcome (BTW I'm going to give a try
  to vector), but currently I'm concerned with two questions:
 
  1. Heavy allocations in 'distance' function. Here is (part of) the
  profile:
 
  COST CENTRE   MODULE%time %alloc  ticks bytes
 
  d2Main9.0   22.0290 6
  d Main8.6   65.9278 18
  d1Main7.5   11.0242 29970
 

I suspect the distance function is not what you intended,


distance :: Double - Double - Double
distance !x1 !x2 = {-# SCC min #-} min d1 d2
where
d = {-# SCC d #-} x1 - x2
d1 = {-# SCC d1 #-} abs d
d2 = {-# SCC d2 #-} abs $ l - d

that would give

distance 0.2 24.8 = 24.6, while the wrapping suggests that it should be 
0.4, so in d2, it should be d1 instead of d.
Either way, both d and d1 are = 25, so the 'abs' in d2 is superfluous, 
removing that alone reduces the allocations drastically and the running 
time by ~40% (astonishingly, not in the profiling version, I suspect it's 
because profiling needs a few registers so that there aren't enough left 
for the loops on my box).

Further, if you export only main from the module, you allow GHC to be more 
aggressive with optimising. On my box, that leads to more allocation again 
because there aren't enough registers, but things become a little faster.

Also, a few more bangs here and there plus a couple of INLINE and UNPACK 
pragmas speed things up, the (on my box) fastest combination I've found is 
attached, it has the same semantics for distance as the original code, 
changing distance to what I believe it should be unfortunately slows it 
down significantly.

On my box, I get a further big speedup by compiling with

-O2 -fexcess-precision -fvia-C -optc-O3

  From reading core I got the impression that everything is strict 
 
  unboxed.

Not everything, there lurk a few boxed Doubles e.g. in average.

  Perhaps this is related to creating some closures? How to get
  rid of those allocations?
 

Do you need to? Sometimes an allocating loop is faster than a non-
allocating one (of course, if you have enough registers for the allocating 
loop to run entirely in registers, it'll be much faster still).

IMO, the important criteria are time and resident memory, not allocation.

  2. Again from reading the core I learned that although 'l' and other
  constants are inlined, their type is boxed Double. This makes sense
  since CAFs are evaluated on demand, but obviously in this particular
  case it does not make sense, so can I somehow make them unboxed?

Putting bangs in the loops where they are used likely uses the unboxed 
values; not exporting them too.


 Forgot to mention, I'm using ghc 6.12.1, compiling with -O2.

{-# LANGUAGE BangPatterns #-}

module Main (main)
where
import System.Random
import Text.Printf
import Data.List
import System.IO


{-# INLINE r #-}
r = 1
{-# INLINE r2 #-}
r2 = r*r
n = 1000
time = 100
{-# INLINE l #-}
l = 25
{-# INLINE h #-}
h = 0.01

data Point = Point {-# UNPACK #-} !Double {-# UNPACK #-} !Double
data Particle = Particle { point :: !Point, angle :: !Double } -- point, angle

{-# INLINE distance #-}
distance :: Double - Double - Double
distance !x1 !x2
| x1  x2   = x2-x1
| otherwise = let !d = x1-x2 in min d (l-d)
-- distance !x1 !x2 = {-# SCC min #-} min d1 d2
-- where
-- !d  = {-# SCC d #-}  x1 - x2
-- !d1 = {-# SCC d1 #-} abs d
-- !d2 = {-# SCC d2 #-} abs $ l - d

{-# INLINE pointsAreClose #-}
pointsAreClose :: Point - Point - Bool
pointsAreClose (Point x1 y1) (Point x2 y2) = sqr (distance x1 x2) + sqr (distance y1 y2)  r2
--where sqr !x = x * x

{-# INLINE sqr #-}
sqr :: Double - Double
sqr !x = x*x

{-# INLINE average #-}
average :: [Double] - Double
average list = -- let (!s,!n) = foldl' (\(!s,!n) x - (s+x,n+1)) (0,0) list in s / n
case foldl' (\(!s, !n) !x - (s + x, n + 1)) (0.0, 0.0) list of
  (!s, !n) - s / n

{-# INLINE wrap #-}
wrap :: Double - Double
wrap x | x  0 = x + l
   | x  l = x - l
   | otherwise = x

makeStep :: [Particle] - Double - [Particle]
makeStep allParticles dt = map (makeStep1 dt allParticles) allParticles

makeStep1 :: Double - [Particle] - Particle - Particle
makeStep1 dt allParticles particle = updateParticle dt particle newAngle
where
  !newAngle = average . map angle . filter (pointsAreClose (point particle) . point) $ allParticles

updateParticle dt (Particle (Point x0 y0) _) newAngle = Particle (Point x1 y1) newAngle
where
x1 = wrap $ x0 + cos newAngle * dt
y1 = wrap $ y0 + sin newAngle * dt

create = zipWith3 (\x y a - Particle (Point x y) a)

move t list = foldl' makeStep list (replicate t h)


Re: [Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Roman Beslik

On 17.06.10 12:40, Roman Cheplyaka wrote:

 From reading core I got the impression that everything is strict
unboxed. Perhaps this is related to creating some closures? How to get
rid of those allocations?
   
Yes, distance creates a closure of type @Double - Double# - Double@ 
which is obviously not necessary. I do not know why.



2. Again from reading the core I learned that although 'l' and other
constants are inlined, their type is boxed Double. This makes sense
since CAFs are evaluated on demand, but obviously in this particular
case it does not make sense, so can I somehow make them unboxed
Hmm, I learned from -ddump-core that distance function uses constant 
25.0.


There is another way to optimize — make GHC use floating point abs 
processor instruction. Now it uses

{{{
abs x | x = 0.0 = x
| otherwise = negateDouble x
}}}
http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/GHC-Float.html

--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Roman Cheplyaka
* Daniel Fischer daniel.is.fisc...@web.de [2010-06-17 16:27:01+0200]
 On Thursday 17 June 2010 11:43:09, Roman Cheplyaka wrote:
  * Roman Cheplyaka r...@ro-che.info [2010-06-17 12:40:59+0300]
 
   I'm trying to optimize the following program:
   http://github.com/feuerbach/particles/blob/303c8a17c9b732e22457b5409bd
  ce4b7520be94a/run.hs
  
   Of course general suggestions are welcome (BTW I'm going to give a try
   to vector), but currently I'm concerned with two questions:
  
   1. Heavy allocations in 'distance' function. Here is (part of) the
   profile:
  
   COST CENTRE   MODULE%time %alloc  ticks bytes
  
   d2Main9.0   22.0290 6
   d Main8.6   65.9278 18
   d1Main7.5   11.0242 29970
  
 
 I suspect the distance function is not what you intended,
 distance 0.2 24.8 = 24.6, while the wrapping suggests that it should be 
 0.4, so in d2, it should be d1 instead of d.

Good catch! :)

 Either way, both d and d1 are = 25, so the 'abs' in d2 is superfluous, 

Correct

 removing that alone reduces the allocations drastically and the running 
 time by ~40%

That's exactly what I'm asking about. 'abs' in C does not require any
allocations, does it? So why does it require any allocations in Haskell,
assuming we've got no lazyness, typeclass indirection (I assume 'abs'
was specialized and inlined) or other high-level features in resulted
low-level code?

 Further, if you export only main from the module, you allow GHC to be more 
 aggressive with optimising. On my box, that leads to more allocation again 
 because there aren't enough registers, but things become a little faster.

Good idea indeed.

   Perhaps this is related to creating some closures? How to get
   rid of those allocations?
  
 
 Do you need to? Sometimes an allocating loop is faster than a non-
 allocating one (of course, if you have enough registers for the allocating 
 loop to run entirely in registers, it'll be much faster still).
 
 IMO, the important criteria are time and resident memory, not allocation.

Maybe, but what bothers me is that I can't answer myself where are those
allocation from. What problem do they solve?

   2. Again from reading the core I learned that although 'l' and other
   constants are inlined, their type is boxed Double. This makes sense
   since CAFs are evaluated on demand, but obviously in this particular
   case it does not make sense, so can I somehow make them unboxed?
 
 Putting bangs in the loops where they are used likely uses the unboxed 
 values; not exporting them too.

I'll play with this, thanks.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
Don't let school get in the way of your education. - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Brandon S. Allbery KF8NH

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Jun 17, 2010, at 12:35 , Roman Cheplyaka wrote:

That's exactly what I'm asking about. 'abs' in C does not require any
allocations, does it? So why does it require any allocations in  
Haskell,

assuming we've got no lazyness, typeclass indirection (I assume 'abs'
was specialized and inlined) or other high-level features in resulted
low-level code?



Quite a few obvious specializations / rules are missing; you can  
propose them.  Answering your question about the implementation of  
abs, my guess is that the current definition works for every type (I'm  
pretty sure C doesn't know what to do with a Data.Ratio).


- --
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.10 (Darwin)

iEYEARECAAYFAkwacCwACgkQIn7hlCsL25XS/wCgvZWVoxZLrIlNywg4ZLA6tEwW
UpEAnjTLW5E5EHYcOIu0Eq1Rclh2wxfR
=BFE2
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Understanding GHC allocations

2010-06-17 Thread Roman Beslik

On 17.06.10 19:12, Roman Beslik wrote:

On 17.06.10 12:40, Roman Cheplyaka wrote:

 From reading core I got the impression that everything is strict
unboxed. Perhaps this is related to creating some closures? How to get
rid of those allocations?
Yes, distance creates a closure of type @Double - Double# - 
Double@ which is obviously not necessary. I do not know why. 

-funfolding-use-threshold=7 removes that closure.

--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe