[Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Johann Höchtl
In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
http://www.vimeo.com/6624203
he considers foldl and foldr harmful as they hinder parallelism
because of Process first element, then the rest Instead he proposes
a divide and merge aproach, especially in the light of going parallel.

The slides at
http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.sun.com%2Fprojects%2Fplrg%2FPublications%2FICFPAugust2009Steele.pdf
[Bware: Google docs]
are somewhat  geared towards Fortress, but I wonder what Haskellers
have to say about his position.

Greetings,

   Johann
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Ketil Malde
Johann Höchtl johann.hoec...@gmail.com writes:

 In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
 http://www.vimeo.com/6624203
 he considers foldl and foldr harmful as they hinder parallelism
 because of Process first element, then the rest Instead he proposes
 a divide and merge aproach, especially in the light of going parallel.

In Haskell foldl/foldr apply to linked lists (or lazy streams, if you
prefer) which are already inherently sequential, and gets a rather harsh
treatment.  I notice he points to finger trees, which I though was
implemented in Data.Sequence.

 are somewhat  geared towards Fortress, but I wonder what Haskellers
 have to say about his position.

Can we (easily) parallelize operations on Data.Sequence?  Often, the
devil is in the details, and there's a lot of ground to cover between
'trees are easier to parallelize' to an efficient and effective high
level interface.  (E.g. non-strict semantics allowing speculative
evalutaion - you still need to insert manual `par`s, right?)

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Thomas Girod
Isn't it the kind of things Data Parallel Haskell is achieving ? I'm in
no way an expert of the field, but from what I've read on the subject it
looked like :

I have a list of N elements and I want to map the function F on it.
technically, I could spawn N processes and build the result from that,
but it would be highly inefficient. So the really hard part is to guess
how I should split my data to get the best performances.

Well, I guess it's pretty easy for a flat structure if you have access
to it's length, but for a recursive one it is complicated as you don't
know if a branch of the tree will lead to a leaf or a huge subtree ...
the evil detail !

Tom


On Thu, Feb 11, 2010 at 11:00:51AM +0100, Ketil Malde wrote:
 Johann Höchtl johann.hoec...@gmail.com writes:
 
  In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
  http://www.vimeo.com/6624203
  he considers foldl and foldr harmful as they hinder parallelism
  because of Process first element, then the rest Instead he proposes
  a divide and merge aproach, especially in the light of going parallel.
 
 In Haskell foldl/foldr apply to linked lists (or lazy streams, if you
 prefer) which are already inherently sequential, and gets a rather harsh
 treatment.  I notice he points to finger trees, which I though was
 implemented in Data.Sequence.
 
  are somewhat  geared towards Fortress, but I wonder what Haskellers
  have to say about his position.
 
 Can we (easily) parallelize operations on Data.Sequence?  Often, the
 devil is in the details, and there's a lot of ground to cover between
 'trees are easier to parallelize' to an efficient and effective high
 level interface.  (E.g. non-strict semantics allowing speculative
 evalutaion - you still need to insert manual `par`s, right?)
 
 -k
 -- 
 If I haven't seen further, it is by standing in the footprints of giants
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Ross Paterson
On Thu, Feb 11, 2010 at 11:00:51AM +0100, Ketil Malde wrote:
 Johann Höchtl johann.hoec...@gmail.com writes:
  In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
  http://www.vimeo.com/6624203
  he considers foldl and foldr harmful as they hinder parallelism
  because of Process first element, then the rest Instead he proposes
  a divide and merge aproach, especially in the light of going parallel.
 
 In Haskell foldl/foldr apply to linked lists (or lazy streams, if you
 prefer) which are already inherently sequential, and gets a rather harsh
 treatment.  I notice he points to finger trees, which I though was
 implemented in Data.Sequence.

Direct URL for the slides:
http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf

As he says, associativity is the key to parallelism -- an old observation,
but still underappreciated.  Even without parallelism, associativity also
gives us greater freedom in structuring our solutions.  The moral is that
our datatypes need associative binary operations more than asymmetric ones.
We use lists too much (because they're so convenient) and apart from the
loss of parallelism it constrains our thinking to the sequential style
criticised by Backus back in 1978.

Regarding finger trees, he's just referring to the idea of caching the
results of monoidal folds in nodes of a tree.  That's crucial to the
applications of finger trees, but it can be applied to any kind of tree.
As he mentions, it's related to the Ladner-Fischer parallel prefix
algorithm, which has an upward pass accumulating sums for each subtree
followed by a downward accumulation passing each sum into the subtree
to the right.   But it's not just for parallelism: when you have these
cached values in a balanced tree, you can compute the sum of any prefix
in O(log n) steps.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Jan-Willem Maessen

On Feb 11, 2010, at 3:41 AM, Johann Höchtl wrote:

 In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
 http://www.vimeo.com/6624203
 he considers foldl and foldr harmful as they hinder parallelism
 because of Process first element, then the rest Instead he proposes
 a divide and merge aproach, especially in the light of going parallel.
 
 The slides at
 http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.sun.com%2Fprojects%2Fplrg%2FPublications%2FICFPAugust2009Steele.pdf
 [Bware: Google docs]

There's no need to use Google docs.  A direct url for the pdf:

http://research.sun.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf

I recently gave a followup talk at Portland State, arguing that notation 
matters, and that even with better notation programmer mindset is also going to 
be hard to change:

http://research.sun.com/projects/plrg/Publications/PSUJan2010-Maessen.pdf

The key thing here isn't *just* the handedness of lists, but the handedness of 
foldl/foldr *irrespective of the underlying data structure*.  So switching to 
tree-structured data a la fingertrees is a necessary step, but not a sufficient 
one.  The use of monoidal reductions has always been an important part of 
parallel programming.

 are somewhat  geared towards Fortress, but I wonder what Haskellers
 have to say about his position.

Now, what if list comprehensions were really shorthand for construction of 
Applicative or Monoid structures by traversing a mixture of data types with a 
common interface (something like this one)?

class Generator t e | t - e
mapReduce :: (Monoid m) = t - (e - m) - m


-Jan-Willem Maessen
 Another Fortress/Haskell crossover

 Greetings,
 
   Johann
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Seen on reddit: or, foldl and foldr considered slightly harmful

2010-02-11 Thread Richard O'Keefe


On Feb 11, 2010, at 9:41 PM, Johann Höchtl wrote:


In a presentation of Guy Steele for ICFP 2009 in Edinburgh:
http://www.vimeo.com/6624203
he considers foldl and foldr harmful as they hinder parallelism
because of Process first element, then the rest Instead he proposes
a divide and merge aproach, especially in the light of going parallel.


I think I first heard about that issue in the 80s.

Let me just take an Erlang perspective on this for a few minutes.

Ignoring types, the classic foldl in Erlang (which is strict) is

foldl(F, A, [X|Xs]) - foldl(F, F(X, A), Xs);
foldl(_, A, []) - A.

In a strict language, you have to wait for F to finish
before you can go on to the next element.
In a non-strict language, you don't.  The interesting
question is how far F can get just given X, before it
has to look at A.

Suppose we can factor F(X, A) into G(H(X), A)
where H is rather time-consuming, but G is cheap (maybe it's an add).

So we write

foldl_map(G, H, A, [X|Xs]) - foldl_map(G, H, G(H(X), A), Xs);
foldl_map(_, _, A, []) - A.

Now we can parallelise it.  We'll spark off a separate thread
for each call of H.

pfoldl_map(G, H, A, Xs) -
Outer = self(),
Pids = [spawn(fun () - Outer ! {self(),H(X)} end || X - Xs],
foldl(G, A, [receive {Pid,Val} - Val end | Pid - Pids]).

If N is the length of the list and G is O(1)
we have O(N) time to traverse the list
and O(N) time to collect and fold the results.
The H calculations can take place in parallel.

Provided each H calculation is expensive enough, we may not _care_
about the O(N) bits.  In fact, if they aren't, we probably shouldn't
be worrying about parallelism here.

The problem exists when we *can't* factor F(X, A) into G(H(X), A)
with a cheap G and costly H.  Then divide-and-parallel-conquer
using k-way trees gives us a computation depth of $\log_k N$
calls of G waiting for results to come back.

As I say, I met this stuff in the 80s.  It's hardly news.

Indeed, if I remember correctly, back when Occam was hot stuff
people were worried about
PAR i = 0 for n
   ...
which forks n processes.  Doing that one at a time in the parent
process takes O(n) time.  I believe something was done about
making this work by recursive bisection.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe