[Haskell-cafe] Re: forkIO on multicore

Simon Marlow Mon, 22 Dec 2008 06:34:43 -0800

Duncan Coutts wrote:

On Fri, 2008-12-19 at 10:42 -0600, Jake McArthur wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Paul Keir wrote:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

This is a CAF (Constant Applicative Form). Since it is actually a
constant it is never garbage collected, and is always shared, so each
thread is only calculating it once. You have essentially created a
lookup table.


Though note that with all our obvious suggestions there is still no
speedup:

heavytask m n = putMVar m $! (fibs !! 100000)
  where
    fibs = n : (n+1) : zipWith (+) fibs (tail fibs)

-- so now fibs is not globally shared but is used per-heavytask
-- it is also evaluated by heavy task rather than just putting a thunk
-- into the MVar

main = do ms <- sequence $ replicate 8 newEmptyMVar
          sequence_
            [ forkIO (heavytask m n)
            | (m, n) <- zip ms [0..] ]
          ms' <- mapM takeMVar ms
          mapM_ print ms'

Looking at the GC stats (+RTS -t -RTS) we see that the majority of the
time in this program is spent doing GC and that when we run with -N4 the
time spent doing GC is even higher.

This is an interesting example. It shows up a weakness in the GC that I'mworking on fixing right now.

The interesting aspect of this example is that the thread stacks get large.You can see this by using +RTS -hT: a large chunk of the heap is taken upby TSOs. Each of those (fibs !! 100000) requires linear stack, because(fibs 100000) depends on (fibs 99999), and so on. That could probably befixed by adding some strictness, but that's not the goal here - we shouldstill be able to run the program in parallel.

So when there are large stacks around, GC takes a long time because it hasto traverse stacks. But we should be able to alleviate the problem by (a)using a larger heap, and (b) using parallel GC. Not doing parallel GC isseriously going to hurt peformance, because the data will have to be movedfrom one CPU's cache to another. But it turns out that parallel GC ismisbehaving on this example too, because it doesn't force each stack to bescanned by the same thread that is executing it - I'm working on fixing that.

Having each CPU be able to GC independently would be a big improvement, ofcourse. We think we understand how this can be done in the context of GHC,it's just a matter of doing it, but it's a big job.

Parallel performance is something that we expect to make dramaticimprovements over the next few months as we investigate more programs andimprove the tools. The current HEAD is already a lot better than 6.10.1.


Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: forkIO on multicore

Reply via email to