Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-15 Thread Bryan O'Sullivan
On Wed, Jun 13, 2012 at 12:56 AM, Roman Leshchinskiy 
r...@cse.unsw.edu.auwrote:


 It doesn't change the semantics of your program but it can make it
 significantly slower (or faster, as in this case). The various state hack
 related tickets on trac might give you an idea of what is happening here.


I filed a bug: http://hackage.haskell.org/trac/ghc/ticket/6166

(I'd CC myself on an existing bug, but trac's search feature gives me tons
of irrelevant hits.)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-13 Thread Roman Leshchinskiy
On 12 Jun 2012, at 12:52, Dmitry Dzhus d...@dzhus.org wrote:

 12.06.2012, 01:08, Roman Leshchinskiy r...@cse.unsw.edu.au:
 
 perhaps the state hack is getting in the way.
 
 I don't quite understand the internals of this yet, but `-fno-state-hack` 
 leads to great performance in both cases!
 How safe is that?

It doesn't change the semantics of your program but it can make it 
significantly slower (or faster, as in this case). The various state hack 
related tickets on trac might give you an idea of what is happening here.

We really need some proper arity analysis!

Roman



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-12 Thread Dmitry Dzhus
12.06.2012, 01:08, Roman Leshchinskiy r...@cse.unsw.edu.au:

 perhaps the state hack is getting in the way.

I don't quite understand the internals of this yet, but `-fno-state-hack` leads 
to great performance in both cases!
How safe is that?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-11 Thread Dmitry Dzhus
Hello everyone.

I wonder why using do notation with `-` can ruin the performance.

In essence the problem is that, for some action `f :: m Double`,
running the code (in my case, `standard` from mwc-random).

f

for million times is fast but the code

do
  v - f
  return v

is slower about a hundred times.

Consider this simple source where we generate an unboxed vector with million
pseudo-random numbers:

 8 -
import qualified Data.Vector.Unboxed as VU

import System.Random.MWC
import System.Random.MWC.Distributions (standard)

count = 100

main = do
  g - create
  e' - VU.replicateM count $ standard g
  return ()
 8 -

Being compiled with -O2, this runs for 0.052 s on my machine.

Changing the replicateM line to use do notation brings the runtime down to 
11.257 s!
See below:

 8 -
import qualified Data.Vector.Unboxed as VU

import System.Random.MWC
import System.Random.MWC.Distributions (standard)

count = 100

main = do
  g - create
  e' - VU.replicateM count $ do
   v - standard g
   return v
  return ()
 8 -

I don't quite understand why this happens. I'm using GHC 7.4.1 on Linux x86_64 
system.

Compiling *both* versions with profiling enabled changes runtime to 5.673 sec,
which is exactly half the runtime of slow version without profiling, and this 
is awkward
(double calculations occuring in do block?).

Does anybody have an idea if this is a problem with my do, or with mwc-random, 
or with vector
(my notation disallowing efficient unboxing?).

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-11 Thread MigMit
Well, it's not do notation, since replacing standard g with standard g = 
return gives the same poor performance. I wonder if it has something to do 
with error checking.

On 11 Jun 2012, at 13:38, Dmitry Dzhus wrote:

 Hello everyone.
 
 I wonder why using do notation with `-` can ruin the performance.
 
 In essence the problem is that, for some action `f :: m Double`,
 running the code (in my case, `standard` from mwc-random).
 
f
 
 for million times is fast but the code
 
do
  v - f
  return v
 
 is slower about a hundred times.
 
 Consider this simple source where we generate an unboxed vector with million
 pseudo-random numbers:
 
  8 -
 import qualified Data.Vector.Unboxed as VU
 
 import System.Random.MWC
 import System.Random.MWC.Distributions (standard)
 
 count = 100
 
 main = do
  g - create
  e' - VU.replicateM count $ standard g
  return ()
  8 -
 
 Being compiled with -O2, this runs for 0.052 s on my machine.
 
 Changing the replicateM line to use do notation brings the runtime down to 
 11.257 s!
 See below:
 
  8 -
 import qualified Data.Vector.Unboxed as VU
 
 import System.Random.MWC
 import System.Random.MWC.Distributions (standard)
 
 count = 100
 
 main = do
  g - create
  e' - VU.replicateM count $ do
   v - standard g
   return v
  return ()
  8 -
 
 I don't quite understand why this happens. I'm using GHC 7.4.1 on Linux 
 x86_64 system.
 
 Compiling *both* versions with profiling enabled changes runtime to 5.673 sec,
 which is exactly half the runtime of slow version without profiling, and this 
 is awkward
 (double calculations occuring in do block?).
 
 Does anybody have an idea if this is a problem with my do, or with 
 mwc-random, or with vector
 (my notation disallowing efficient unboxing?).
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-11 Thread Malcolm Wallace

On 11 Jun 2012, at 10:38, Dmitry Dzhus wrote:

 main = do
  g - create
  e' - VU.replicateM count $ standard g
  return ()

In all likelhood, ghc is spotting that the value e' is not used, and that there 
are no side-effects, so it does not do anything at runtime.  If you expand the 
action argument to replicateM, such that it uses do-notation instead, perhaps 
ghc can no longer prove the lack of side-effects, and so actually runs the 
computation before throwing away its result.

When writing toy benchmarks in a lazy language, it is always important to 
understand to what extent your program _uses_ the data from a generator, or you 
are bound to get misleading performance measurements.

Regards,
Malcolm


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-11 Thread Dmitry Dzhus
11.06.2012, 14:17, Malcolm Wallace malcolm.wall...@me.com:
 that there are no side-effects

There are — PRNG state is updated for RealWorld, that's why monadic replicateM 
is used.

You can add something like

  print $ (VU.!) e 50

after e is bound and still get 0.057 sec with do-less version.
This quite matches the performance claimed by mwc-random package
and seems reasonable since modern hardware shouldn't have any problem
with generating  twenty million random variates in a second with one execution 
thread.

Your note on laziness would be correct in case like
-- 8 --
import qualified Data.Vector.Unboxed as VU
import Data.Functor

import System.Random.MWC
import System.Random.MWC.Distributions (standard)

count = 1

main = do
  g - create
  e - return $ VU.replicate count (212.8506 :: Double)
  return ()
-- 8 ---
Where unused `e` is truly left unevaluated (you could force it
by matching with `!e` for example).

Profiling indicates that random number sampling really occurs for
both of original versions with `replicateM`, expectedly taking most of time:

Mon Jun 11 14:24 2012 Time and Allocation Profiling Report  (Final)

   slow-mwc-vector +RTS -p -RTS

total time  =5.45 secs   (5453 ticks @ 1000 us, 1 processor)
total alloc = 3,568,827,856 bytes  (excludes profiling overheads)

COST CENTRE   MODULE  %time %alloc

uniform2  System.Random.MWC45.0   53.7
uniformWord32 System.Random.MWC31.3   31.5
standard.loop System.Random.MWC.Distributions   4.11.1
uniform1  System.Random.MWC 3.94.5
nextIndex System.Random.MWC 3.61.4
uniform   System.Random.MWC 2.83.3
uniform   System.Random.MWC 2.51.4
wordsToDouble System.Random.MWC 2.10.5

I could drop do notation and go with the simpler version if I wanted just 
a vector of variates. But in reality I want a vector of tuples with random
components:
-- 8 --
import qualified Data.Vector.Unboxed as VU
import Control.Monad

import System.Random.MWC
import System.Random.MWC.Distributions (standard)

count = 100

main = do
  g - create
  e - VU.replicateM count $ do
 v1 - standard g
 v2 - standard g
 v3 - standard g
 return (v1, v2, v3)
  return ()
-- 8 ---
which runs for the same 11.412 seconds.
Since three times more variates are generated and run time stays the same,
this implies that perhaps some optimizations of vector package interfere
with mwc-random — can this be the case?
This becomes quite a bottleneck in my application.

On the other hand, mwc-random has `normal` function implemented as follows:

-- 8 --
normal m s gen = do
  x - standard gen
  return $! m + s * x
-- 8 ---
which again uses explicit `do`. Both standard and normal are marked with INLINE.

Now if I try to write
-- 8 --
  e - VU.replicateM count $ normal 0 1 g
-- 8 ---
in my test case, quite expectedly I get horrible performance of 11 seconds,
even though I'm not using do myself.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance with do notation, mwc-random and unboxed vector

2012-06-11 Thread Roman Leshchinskiy
On 11/06/2012, at 10:38, Dmitry Dzhus wrote:

 Consider this simple source where we generate an unboxed vector with million
 pseudo-random numbers:
 
  8 -
 import qualified Data.Vector.Unboxed as VU
 
 import System.Random.MWC
 import System.Random.MWC.Distributions (standard)
 
 count = 100
 
 main = do
  g - create
  e' - VU.replicateM count $ standard g
  return ()
  8 -
 
 Being compiled with -O2, this runs for 0.052 s on my machine.
 
 Changing the replicateM line to use do notation brings the runtime down to 
 11.257 s!
 See below:
 
  8 -
 import qualified Data.Vector.Unboxed as VU
 
 import System.Random.MWC
 import System.Random.MWC.Distributions (standard)
 
 count = 100
 
 main = do
  g - create
  e' - VU.replicateM count $ do
   v - standard g
   return v
  return ()
  8 -

The former essentially generates this:

  replicateM n ((letrec f = ... in f) `cast` ...)

and the latter this:

  replicateM n (\(s :: State# RealWorld) - (letrec f = ... in f s) `cast` ...)

I'd look further into this but mwc-random just inlines too much stuff. Could 
you perhaps find a smaller example that doesn't use mwc-random? In any case, it 
looks like a GHC bug, perhaps the state hack is getting in the way.

Roman



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe