[Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-15 Thread Simon Marlow

On 15/12/09 00:37, Philip Beadling wrote:



If you still have trouble, then try using ThreadScope

   http://code.haskell.org/ThreadScope/

with GHC 6.12.1.  You can use ThreadScope directly from the darcs
repository on code.haskell.org, and we hope to do a proper release soon.

Cheers,
Simon


Thanks for the advice, just downloaded ThreadScope and it's pretty
useful (before I was using Ubuntu's System Monitor which isn't ideal).

I've moved onto 6.12 and I now have my program working nicely over 2
cores - the problem was at least in part my own design - I was
generating large thunks in my parallel version which was killing
performance.  With this solved 2 cores gives me ~50% performance
increase.

What I'm doing now is taking a list I am going to fold over and
splitting it up so I have a list of lists, where each parent list
element representing work for 1 core.  I then fold lazily and only
parallelise on the final sum operation which (as far as I can see) sends
each chunk of folds to a different core and sums the results.

Can I confirm - what you are suggesting is that although I can't
parallelise fold itself, I could force evaluation on the list I am about
to fold in parallel and then merely accumulate the result at the end --
thus most the donkey work is done in parallel?


Yes.  If it turns out that the list elements are too small to spark 
individually, then you may want to split the list into chunks and 
evaluate/sum the chunks in parallel, before summing the result.  This is 
a typical map/reduce pattern.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-14 Thread Simon Marlow

On 13/12/2009 18:09, Philip Beadling wrote:


I've come to the conclusion that, yep, you can't (directly) parallelise
of fold operation, as fold guarantees order of processing.


True, but you can evaluate the elements in the input list to foldl' in 
parallel, as you were doing.  Presumably this doesn't give you enough 
parallelism in your case, though.


May I recommend that you pick up the latest parallel package:

http://hackage.haskell.org/package/parallel

and also the GHC 6.12.1 release (due any time now).  The new parallel 
package in particular cures a nasty space leak when using Strategies, 
and the new GHC includes a host of improvements to parallel performance.


If you still have trouble, then try using ThreadScope

 http://code.haskell.org/ThreadScope/

with GHC 6.12.1.  You can use ThreadScope directly from the darcs 
repository on code.haskell.org, and we hope to do a proper release soon.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-14 Thread Philip Beadling

 If you still have trouble, then try using ThreadScope
 
   http://code.haskell.org/ThreadScope/
 
 with GHC 6.12.1.  You can use ThreadScope directly from the darcs 
 repository on code.haskell.org, and we hope to do a proper release soon.
 
 Cheers,
   Simon

Thanks for the advice, just downloaded ThreadScope and it's pretty
useful (before I was using Ubuntu's System Monitor which isn't ideal).

I've moved onto 6.12 and I now have my program working nicely over 2
cores - the problem was at least in part my own design - I was
generating large thunks in my parallel version which was killing
performance.  With this solved 2 cores gives me ~50% performance
increase.

What I'm doing now is taking a list I am going to fold over and
splitting it up so I have a list of lists, where each parent list
element representing work for 1 core.  I then fold lazily and only
parallelise on the final sum operation which (as far as I can see) sends
each chunk of folds to a different core and sums the results.

Can I confirm - what you are suggesting is that although I can't
parallelise fold itself, I could force evaluation on the list I am about
to fold in parallel and then merely accumulate the result at the end --
thus most the donkey work is done in parallel?  

If this is possible, it may be more flexible then my method.  I'm lucky
as each fold operation will take give-or-take the same amount of time so
I can just chunk up fold jobs for each core equally.  If this wasn't the
case (and it certainly won't always be!), parallelising on individual
items would be the way to go.

Thanks,

Phil.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-13 Thread Philip Beadling
On Sat, 2009-12-12 at 13:46 +, Ben Millwood wrote:
 On Sat, Dec 12, 2009 at 10:08 AM, Maciej Piechotka
 uzytkown...@gmail.com wrote:
  If operation is associative it can be done using divide et impera
  spliting list in half and operating on it pararerlly then split in half
  etc.
 

Thank you very much for the replies.

I've come to the conclusion that, yep, you can't (directly) parallelise
of fold operation, as fold guarantees order of processing.

With something like map the runtime is free to continue sparking
function application to each element without waiting for the result.
So we spark f x, force evaluation of the remainder of the xs and
recurse.
I'm *guessing* at a detailed level when we are creating the output list,
haskell can concat each result element before f x returns due to
laziness - that is, haskell doesn't need to wait for evaluation of f x,
before continuing?

With fold, and specifically with foldl (+), this isn't the case as (+)
is strict on both arguments and thus it cannot continue until each
sparked evaluation has completed and combined with the accumulator.  If
(+) was not strict on both arguments, I'm not sure if could solider
on... assuming I've understood map correctly!?


Writing it out long hand (sorry if this is tedious!), we have:

using :: a - Strategy a - a
using x s = s x `seq` x

rwhnf :: Strategy a 
rwhnf x = x `seq` ()  

parList :: Strategy a - Strategy [a]
parList strat [] = ()
parList strat (x:xs) = strat x `par` (parList strat xs)

parMap :: Strategy b - (a - b) - [a] - [b]
parMap strat f = (`using` parList strat) . map f 


'using' applies a strategy to an item, and then returns the item.
'parList' is a (combinator) strategy which applies an atomic strategy to
each element in the list *in parallel* (for example forcing each element
to WHNF).

So for parMap we have xs passed into 'map f' - the result is then passed
to 'using' which will force application of 'f' on each element in
parallel by way of 'parList'.  No forced evaluation is dependant on a
previous evaluation.

Now for parFoldl - a crude and wrong representation for my purposes
could be:

parFoldl :: Num b = Strategy b - (a - b) - [a] - b
parFoldl strat f = sum . (`using` parList strat) . map f

This isn't really a fold of course, but it is doing roughly the same
thing, it's summing the results of applying function 'f' to each element
in a list.

The problem here is that sum will only allow one spark at a time,
because

sum [] = 0
sum (x:xs) = x + sum xs

So we get something like:
0 + (x4 + (x3 + (x2 + (x1

For example the result for (x4 + previous) can only be evaluated after
x3, x2 and x1 have been evaluated.  This means it won't spark evaluation
on x4 until (x3 + ) has been evaluated, thus only one core is ever
used.

I believe fold is just the general case of sum and the same logic
applies.


I suppose my questions are:

Have I got this right, if not very succinct!?  

Is it purely the strictness of (+) that causes this situation?

Ignoring DPH, is it possible to write a parallel fold avoiding something
like the technique below?


Anyhow, a workaround similar to those suggested I came up with is to
divide the folds up across the cores and then sum the sub-folds - this
produces approximately double the performance across two cores:

import Control.Parallel.Strategies (parMap,rwhnf)
import Data.List (foldl')
import Data.List.Split (chunk)
import GHC.Conc (numCapabilities)


-- Prepare to share work to be 
-- done across available cores
chunkOnCpu :: [a] - [[a]]
chunkOnCpu xs = chunk (length xs `div` numCapabilities) xs
 
-- Spark a fold of each chunk and
-- sum the results. Only works because
-- for associative folds.
foldChunks :: ([a] - a) - (a - b - a) - a - [[b]] - a
foldChunks combineFunc foldFunc acc = 
  combineFunc . (parMap rwhnf $ foldl' foldFunc acc)

-- Some pointless work to keep thread busy
workFunc :: Int - Int
workFunc 1 = 1
workFunc x = workFunc $ x - 1

-- Do some work on element x and append
foldFunc :: Int - Int - Int
foldFunc acc x = acc + workFunc x 

testList = repeat 10
answer =  foldChunks sum foldFunc 0 $ chunkOnCpu (take 50 testList)

main :: IO()
main = print answer













___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-13 Thread Philip Beadling
 -- Prepare to share work to be 
 -- done across available cores
 chunkOnCpu :: [a] - [[a]]
 chunkOnCpu xs = chunk (length xs `div` numCapabilities) xs
  
 -- Spark a fold of each chunk and
 -- sum the results. Only works because
 -- for associative folds.
 foldChunks :: ([a] - a) - (a - b - a) - a - [[b]] - a
 foldChunks combineFunc foldFunc acc = 
   combineFunc . (parMap rwhnf $ foldl' foldFunc acc)


I should probably point out that use of chunk above isn't a good idea in
anything beyond a toy example.  If you have used a list comprehension to
create your input then splitting it like the above results in thunks
that grow with list size as chunk forces generation of the list.  This
rapidly negates any advantage gained from processing across 1 core!
This is easily solved - just alter the generating function to create a
*list* of list comprehensions equal in length to the number of cores you
wish to process across, rather than create one list that is split across
the cores later.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-12 Thread Maciej Piechotka
On Sat, 2009-12-12 at 02:57 +, Philip Beadling wrote:
 Hi,
 
 Can anyone put me right here.  I am trying to use a setup similar to
 parMap to spark each valuation in a list in parallel, where the
 resulting (evaluated) list is folded to produce a final single result.
 
 Having done the obligatory google, I modified a few common examples to
 give:
 
 pfoldl f acc xs =  foldl' f acc (xs `using` parList rwhnf)
 
 
 This compiles and if I monitor my CPUs it starts to use both cores, but
 after approx 10 seconds, one core drops to low rate (I'm using a 2 core
 machine).
 
 The end result is that -N2 is actually a bit slower than -N1!
 
 I'm guessing I haven't grasped the concept properly - although as map is
 just 'foldl (+) 0' I'm at a loss to see why this approach wouldn't work
 given it is pretty similar to parMap - could anyone point out what I'm
 missing?
 
 If it's any use the context of the code is below.
 
 Many thanks!
 
 
 Phil.

At least IMHO foldl cannot be pararalized since it have rather strong
dependencies i.e. only calculation of acc and first value of list.

If operation is associative it can be done using divide et impera
spliting list in half and operating on it pararerlly then split in half
etc.
8 stages (the same number (O n) as normally + cost of synchronization
etc).
 1
  +
 2
   +
  3
+
   4
 +
5
  +
 6
   +
  7
+
   8

3 operations (to be more specific O (log n)):
1
+
2
+
3
+
4
+  
5
+
6
+
7
+
8


Regards
PS. I don't know nor understend the pararell library. Maybe I don't
understend something but without associativity IMHO data is too
interdependent IMHO to do something.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Parallel foldl doesn't work correctly

2009-12-12 Thread Ben Millwood
On Sat, Dec 12, 2009 at 10:08 AM, Maciej Piechotka
uzytkown...@gmail.com wrote:
 If operation is associative it can be done using divide et impera
 spliting list in half and operating on it pararerlly then split in half
 etc.

I implemented something like this as an exercise:

http://benmachine.co.uk/parconcat.hs

It took me a little while to get everything to par as it should and
I'm still not sure I'm doing it in the most efficient way, but there
it is.

(If the output is nonsense, you might try changing hPutStrLn stderr
into putStrLn so that it's buffered and arrives in blocks).
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe