subject:"\[Haskell\-cafe\] mapM is supralinear\?"

Re: [Haskell-cafe] mapM is supralinear?

2011-09-27 Thread Malcolm Wallace


On 26 Sep 2011, at 23:14, Arseniy Alekseyev wrote:

 Garbage collection takes amortized O(1) per allocation, doesn't it?

No.  For Mark-Sweep GC, the cost is proportional to

(H+R) / (H-R)
where H is the total heap size
  R is the reachable (i.e. live) heap

This formula amortises the cost of a collection over the amount of free space 
recovered.
For two-space copying collection, the cost is proportional to

R / ((H/2)-R)

In both cases, as R approaches H (or H/2), the cost of GC becomes rather large. 
 So in essence, the more live data you have, the more GC will cost.

Regards,
Malcolm


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-27 Thread Arseniy Alekseyev

Malcolm, one should amortize the cost of the collection over the
amount of free space allocated rather than recovered (there are cases
when no space is recovered, would you call the GC cost infinite
then?).

If one does that, and also runs the expensive collection not too often
[1], the time amortizes to O(1) easily (notice that the amount
allocated between GCs can be kept proportional to H).

I don't know if GC used in GHC does indeed have amortized O(1) cost
per allocation, but if it doesn't, it should.

[1] - a sufficient condition would be that there exists some real
number q such that q  1 and the next GC runs not sooner than when H
reaches H_0*q where H_0 is the heap size remaining after the last
collection.

On 27 September 2011 10:02, Malcolm Wallace malcolm.wall...@me.com wrote:

 On 26 Sep 2011, at 23:14, Arseniy Alekseyev wrote:

 Garbage collection takes amortized O(1) per allocation, doesn't it?

 No.  For Mark-Sweep GC, the cost is proportional to

 (H+R) / (H-R)
 where H is the total heap size
      R is the reachable (i.e. live) heap

 This formula amortises the cost of a collection over the amount of free space 
 recovered.
 For two-space copying collection, the cost is proportional to

 R / ((H/2)-R)

 In both cases, as R approaches H (or H/2), the cost of GC becomes rather 
 large.  So in essence, the more live data you have, the more GC will cost.

 Regards,
    Malcolm



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-27 Thread Malcolm Wallace


On 27 Sep 2011, at 11:23, Arseniy Alekseyev wrote:

 Malcolm, one should amortize the cost of the collection over the
 amount of free space allocated rather than recovered

They are the same thing.  You can only allocate from the space that has been 
recovered.  It is true that generational GC has a nursery area of largely 
constant size, which is always used for fresh allocation, but that is usually 
considered an optimisation (albeit a considerable one), which does not 
fundamentally change the underlying asymptotic costs of the major collections.  
When you have large heap residency, the proportion of time spent in GC 
increases.

 (there are cases
 when no space is recovered, would you call the GC cost infinite
 then?).

Indeed I would.  When that happens, usually the program aborts without 
completing its computation, so the computation is infinitely delayed. 

Regards,
Malcolm

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-27 Thread Ketil Malde


Here's my feeble understanding of GC:

1. GC runs when after some specified amount of allocations
2. GC runs in time proportional to the live heap, which it needs to
   traverse.

This means that for a long running mapM, like any other computation
generating a long list, (1) GC will run a number of times proportional to
the length of the list, and (2) each run will have a cost
proportional to the length of the list.  I.e. a linear algorithm is now
quadratic.

A lazy mapM (or mapM_), consuming the list as fast as it is generated,
will of course keep the list short/heap small, and thus the cost of each
GC is constant (for some value of constant).

I suppose generational GC will help in practice, since the young
generation gets to start near the end of the list, but it will still be
linear in generated length, and you still need major GCs too,
occasionally.

Also, I guess mapM is more vulnerable to this, since the operations (IO,
say) involved in building the list likely do short-lived allocations,
triggering more GCs than simpler, pure computations would.

Do let me know if this is probably a terribly naive view.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-27 Thread Arseniy Alekseyev

Ketil, I suppose your argument is correct for some implementations of
GC (hopefully not the ones I use). However, a trivial optimisation of
running GC with a frequency logarithmic in the (allocation rate / heap
size) seems to make almost any kind of GC amortized O(1) while keeping
the total heap bounded within constant factor of the reachable heap
size. So, we optimize the (1.) in your algorithm and in case of mapM
we should get a logarithmic (instead of linear) number of GCs with
exponentially (instead of linearly) increasing costs reaching O(N) in
the end and totalling to O(N) too!

Does anyone know if such worst-case complexity precautions are taken
in GHC? If not, why?

Malcolm, I fail to see how They are the same thing is compatible
with Indeed I would. They together imply that O(1) (GC amortized
over the amount allocated) and O(+inf) (GC amortized over the amount
reclaimed) are the same thing. Also, I don't see how OOM condition is
relevant here. I may have enough memory for a lot of useful things
even without GC.

On 27 September 2011 12:42, Ketil Malde ke...@malde.org wrote:

 Here's my feeble understanding of GC:

 1. GC runs when after some specified amount of allocations
 2. GC runs in time proportional to the live heap, which it needs to
   traverse.

 This means that for a long running mapM, like any other computation
 generating a long list, (1) GC will run a number of times proportional to
 the length of the list, and (2) each run will have a cost
 proportional to the length of the list.  I.e. a linear algorithm is now
 quadratic.

 A lazy mapM (or mapM_), consuming the list as fast as it is generated,
 will of course keep the list short/heap small, and thus the cost of each
 GC is constant (for some value of constant).

 I suppose generational GC will help in practice, since the young
 generation gets to start near the end of the list, but it will still be
 linear in generated length, and you still need major GCs too,
 occasionally.

 Also, I guess mapM is more vulnerable to this, since the operations (IO,
 say) involved in building the list likely do short-lived allocations,
 triggering more GCs than simpler, pure computations would.

 Do let me know if this is probably a terribly naive view.

 -k
 --
 If I haven't seen further, it is by standing in the footprints of giants



On 27 September 2011 12:35, Malcolm Wallace malcolm.wall...@me.com wrote:

 On 27 Sep 2011, at 11:23, Arseniy Alekseyev wrote:

 Malcolm, one should amortize the cost of the collection over the
 amount of free space allocated rather than recovered

 They are the same thing.  You can only allocate from the space that has been 
 recovered.  It is true that generational GC has a nursery area of largely 
 constant size, which is always used for fresh allocation, but that is usually 
 considered an optimisation (albeit a considerable one), which does not 
 fundamentally change the underlying asymptotic costs of the major 
 collections.  When you have large heap residency, the proportion of time 
 spent in GC increases.

 (there are cases
 when no space is recovered, would you call the GC cost infinite
 then?).

 Indeed I would.  When that happens, usually the program aborts without 
 completing its computation, so the computation is infinitely delayed.

 Regards,
    Malcolm


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-26 Thread Lennart Augustsson

You seem to ignore garbage collection.

On Sat, Sep 24, 2011 at 6:40 AM, Arseniy Alekseyev 
arseniy.alekse...@gmail.com wrote:

  Apparently it doesn't, and it seems to be fixed now.

 Does anyone know what exactly the bug was? Because this seems like a
 serious bug to me. I've run into it myself today and wasn't happy.
 Linear algorithms should work in linear time however much memory they
 allocate (modulo cache thrashing of course). Existence of people
 claiming otherwise surprises me!


 On 22 September 2011 01:05, John Lato jwl...@gmail.com wrote:
  On Wed, Sep 21, 2011 at 1:57 PM, Tim Docker t...@dockerz.net wrote:
 
  On 09/09/2011, at 8:19 PM, John Lato wrote:
 
  Agreed.  Whenever I'd like to use mapM (or any other function for
  which a *M_ is available), I've found the following rules helpful:
 
  1.  If I can guarantee the list is short (~ n=20), go ahead and use
 mapM
  2.  Otherwise use mapM_, foldM_, or foldM if a real reduction is
  possible (i.e. not foldM snocM []).
 
  Step 2 sometimes requires changing my design, but it's always been for
  the better.  `mapM_` tends to require more pipeline composition, so
  it's leveraging the language's strengths.
 
  This thread is really interesting - it relates directly to problems I am
  currently
  having with mapM over large lists (see the thread stack overflow
 pain).
 
  Can you explain what you mean by mapM_ tends to require more pipeline
  composition?
  In what way is it leveraging the language strengths?
 
  Hmm, that is suitably cryptic.  One way to think of it is an inversion
  of control.  Instead of operating on whole collections of things in a
  monad, you specify monadic actions (pipelines) which are applied
  sequentially to each input.
 
  Here's a simple example.  Suppose you have a bunch of data serialized
  to files, and you want to read each file into a data structure, apply
  some process based upon the last file's data, and write out the output
  to new files.  One way to do that would look like:
 
  do
 dats - mapM readMyData files
 let pairs = zip (mempty:dats) dats
 zipWithM_ (\(last, this) fname - writeMyData (update last this)
  fname) pairs newFiles
 
  However, you could also put everything into a single monadic
  operation, like this
 
  do
 foldM_ (\last (infile, outfile) - do
 this - readMyData
 infile
 writeMyData
  (update last this) outfile
 return this
)
mempty
(zip files newFiles)
 
  The first interleaves control (mapM, zipWIthM_) with monadic actions
  (file IO), whereas the second only has one control function (foldM_)
  which completely processes one input.  I say this is more pipeline
  composition because you have to create an entire pipeline from input
  to output, which is then sequentially fed inputs by the control
  function.
 
  I say this leverages Haskell's strengths because it's quite easy to
  compose functions and monadic actions in Haskell.  It also tends to be
  garbage-collector friendly.  I also find it much easier to reason
  about space usage.  You don't need to worry if part of a list is being
  retained, because the full list of data doesn't appear anywhere.  If
  you need to access prior elements they're specified explicitly so you
  know exactly how much data you're holding on to.
 
  My perspective might be warped by my work on iteratees, but I find
  this a very natural approach.
 
  John L.
 
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-26 Thread Arseniy Alekseyev

Garbage collection takes amortized O(1) per allocation, doesn't it?

On 26 September 2011 18:00, Lennart Augustsson lenn...@augustsson.net wrote:
You seem to ignore garbage collection.

On Sat, Sep 24, 2011 at 6:40 AM, Arseniy Alekseyev
arseniy.alekse...@gmail.com wrote:

Apparently it doesn't, and it seems to be fixed now.

Does anyone know what exactly the bug was? Because this seems like a
serious bug to me. I've run into it myself today and wasn't happy.
Linear algorithms should work in linear time however much memory they
allocate (modulo cache thrashing of course). Existence of people
claiming otherwise surprises me!

On 22 September 2011 01:05, John Lato jwl...@gmail.com wrote:
On Wed, Sep 21, 2011 at 1:57 PM, Tim Docker t...@dockerz.net wrote:

On 09/09/2011, at 8:19 PM, John Lato wrote:

Agreed. Whenever I'd like to use mapM (or any other function for
which a *M_ is available), I've found the following rules helpful:

1. If I can guarantee the list is short (~ n=20), go ahead and use
mapM
2. Otherwise use mapM_, foldM_, or foldM if a real reduction is
possible (i.e. not foldM snocM []).

Step 2 sometimes requires changing my design, but it's always been for
the better. `mapM_` tends to require more pipeline composition, so
it's leveraging the language's strengths.

This thread is really interesting - it relates directly to problems I
am
currently
having with mapM over large lists (see the thread stack overflow
pain).

Can you explain what you mean by mapM_ tends to require more pipeline
composition?
In what way is it leveraging the language strengths?

Hmm, that is suitably cryptic. One way to think of it is an inversion
of control. Instead of operating on whole collections of things in a
monad, you specify monadic actions (pipelines) which are applied
sequentially to each input.

Here's a simple example. Suppose you have a bunch of data serialized
to files, and you want to read each file into a data structure, apply
some process based upon the last file's data, and write out the output
to new files. One way to do that would look like:

do
dats - mapM readMyData files
let pairs = zip (mempty:dats) dats
zipWithM_ (\(last, this) fname - writeMyData (update last this)
fname) pairs newFiles

However, you could also put everything into a single monadic
operation, like this

do
foldM_ (\last (infile, outfile) - do
this - readMyData
infile
writeMyData
(update last this) outfile
return this
)
mempty
(zip files newFiles)

The first interleaves control (mapM, zipWIthM_) with monadic actions
(file IO), whereas the second only has one control function (foldM_)
which completely processes one input. I say this is more pipeline
composition because you have to create an entire pipeline from input
to output, which is then sequentially fed inputs by the control
function.

I say this leverages Haskell's strengths because it's quite easy to
compose functions and monadic actions in Haskell. It also tends to be
garbage-collector friendly. I also find it much easier to reason
about space usage. You don't need to worry if part of a list is being
retained, because the full list of data doesn't appear anywhere. If
you need to access prior elements they're specified explicitly so you
know exactly how much data you're holding on to.

My perspective might be warped by my work on iteratees, but I find
this a very natural approach.

John L.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-23 Thread Arseniy Alekseyev

 Apparently it doesn't, and it seems to be fixed now.

Does anyone know what exactly the bug was? Because this seems like a
serious bug to me. I've run into it myself today and wasn't happy.
Linear algorithms should work in linear time however much memory they
allocate (modulo cache thrashing of course). Existence of people
claiming otherwise surprises me!


On 22 September 2011 01:05, John Lato jwl...@gmail.com wrote:
 On Wed, Sep 21, 2011 at 1:57 PM, Tim Docker t...@dockerz.net wrote:

 On 09/09/2011, at 8:19 PM, John Lato wrote:

 Agreed.  Whenever I'd like to use mapM (or any other function for
 which a *M_ is available), I've found the following rules helpful:

 1.  If I can guarantee the list is short (~ n=20), go ahead and use mapM
 2.  Otherwise use mapM_, foldM_, or foldM if a real reduction is
 possible (i.e. not foldM snocM []).

 Step 2 sometimes requires changing my design, but it's always been for
 the better.  `mapM_` tends to require more pipeline composition, so
 it's leveraging the language's strengths.

 This thread is really interesting - it relates directly to problems I am
 currently
 having with mapM over large lists (see the thread stack overflow pain).

 Can you explain what you mean by mapM_ tends to require more pipeline
 composition?
 In what way is it leveraging the language strengths?

 Hmm, that is suitably cryptic.  One way to think of it is an inversion
 of control.  Instead of operating on whole collections of things in a
 monad, you specify monadic actions (pipelines) which are applied
 sequentially to each input.

 Here's a simple example.  Suppose you have a bunch of data serialized
 to files, and you want to read each file into a data structure, apply
 some process based upon the last file's data, and write out the output
 to new files.  One way to do that would look like:

 do
    dats - mapM readMyData files
    let pairs = zip (mempty:dats) dats
    zipWithM_ (\(last, this) fname - writeMyData (update last this)
 fname) pairs newFiles

 However, you could also put everything into a single monadic
 operation, like this

 do
    foldM_ (\last (infile, outfile) - do
                                                    this - readMyData infile
                                                    writeMyData
 (update last this) outfile
                                                    return this
               )
               mempty
               (zip files newFiles)

 The first interleaves control (mapM, zipWIthM_) with monadic actions
 (file IO), whereas the second only has one control function (foldM_)
 which completely processes one input.  I say this is more pipeline
 composition because you have to create an entire pipeline from input
 to output, which is then sequentially fed inputs by the control
 function.

 I say this leverages Haskell's strengths because it's quite easy to
 compose functions and monadic actions in Haskell.  It also tends to be
 garbage-collector friendly.  I also find it much easier to reason
 about space usage.  You don't need to worry if part of a list is being
 retained, because the full list of data doesn't appear anywhere.  If
 you need to access prior elements they're specified explicitly so you
 know exactly how much data you're holding on to.

 My perspective might be warped by my work on iteratees, but I find
 this a very natural approach.

 John L.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-21 Thread Tim Docker



On 09/09/2011, at 8:19 PM, John Lato wrote:


Agreed.  Whenever I'd like to use mapM (or any other function for
which a *M_ is available), I've found the following rules helpful:

1.  If I can guarantee the list is short (~ n=20), go ahead and use  
mapM

2.  Otherwise use mapM_, foldM_, or foldM if a real reduction is
possible (i.e. not foldM snocM []).

Step 2 sometimes requires changing my design, but it's always been for
the better.  `mapM_` tends to require more pipeline composition, so
it's leveraging the language's strengths.


This thread is really interesting - it relates directly to problems I  
am currently
having with mapM over large lists (see the thread stack overflow  
pain).


Can you explain what you mean by mapM_ tends to require more pipeline  
composition?

In what way is it leveraging the language strengths?

Thanks,

Tim

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-21 Thread John Lato

On Wed, Sep 21, 2011 at 1:57 PM, Tim Docker t...@dockerz.net wrote:

 On 09/09/2011, at 8:19 PM, John Lato wrote:

 Agreed.  Whenever I'd like to use mapM (or any other function for
 which a *M_ is available), I've found the following rules helpful:

 1.  If I can guarantee the list is short (~ n=20), go ahead and use mapM
 2.  Otherwise use mapM_, foldM_, or foldM if a real reduction is
 possible (i.e. not foldM snocM []).

 Step 2 sometimes requires changing my design, but it's always been for
 the better.  `mapM_` tends to require more pipeline composition, so
 it's leveraging the language's strengths.

 This thread is really interesting - it relates directly to problems I am
 currently
 having with mapM over large lists (see the thread stack overflow pain).

 Can you explain what you mean by mapM_ tends to require more pipeline
 composition?
 In what way is it leveraging the language strengths?

Hmm, that is suitably cryptic.  One way to think of it is an inversion
of control.  Instead of operating on whole collections of things in a
monad, you specify monadic actions (pipelines) which are applied
sequentially to each input.

Here's a simple example.  Suppose you have a bunch of data serialized
to files, and you want to read each file into a data structure, apply
some process based upon the last file's data, and write out the output
to new files.  One way to do that would look like:

do
dats - mapM readMyData files
let pairs = zip (mempty:dats) dats
zipWithM_ (\(last, this) fname - writeMyData (update last this)
fname) pairs newFiles

However, you could also put everything into a single monadic
operation, like this

do
foldM_ (\last (infile, outfile) - do
this - readMyData infile
writeMyData
(update last this) outfile
return this
   )
   mempty
   (zip files newFiles)

The first interleaves control (mapM, zipWIthM_) with monadic actions
(file IO), whereas the second only has one control function (foldM_)
which completely processes one input.  I say this is more pipeline
composition because you have to create an entire pipeline from input
to output, which is then sequentially fed inputs by the control
function.

I say this leverages Haskell's strengths because it's quite easy to
compose functions and monadic actions in Haskell.  It also tends to be
garbage-collector friendly.  I also find it much easier to reason
about space usage.  You don't need to worry if part of a list is being
retained, because the full list of data doesn't appear anywhere.  If
you need to access prior elements they're specified explicitly so you
know exactly how much data you're holding on to.

My perspective might be warped by my work on iteratees, but I find
this a very natural approach.

John L.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-09 Thread Ertugrul Soeylemez

Roman Cheplyaka r...@ro-che.info wrote:

  In general it's a bad idea to use mapM over IO.

 Could you explain why?

Most applications don't require loading the entire result into memory,
so a combinator like foldM is more appropriate.  You should use mapM
over IO only, when the list is short, or when there is really no way
around loading everything into memory.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-09 Thread John Lato

 From: Daniel Fischer daniel.is.fisc...@googlemail.com

 On Friday 09 September 2011, 00:41:11, Roman Cheplyaka wrote:
 * Ertugrul Soeylemez e...@ertes.de [2011-09-07 16:20:03+0200]

  In general it's a bad idea to use mapM over IO.

 Could you explain why?

 Take it with a grain of salt, there's nothing necessarily wrong with using
 mapM over IO on short lists.

Agreed.  Whenever I'd like to use mapM (or any other function for
which a *M_ is available), I've found the following rules helpful:

1.  If I can guarantee the list is short (~ n=20), go ahead and use mapM
2.  Otherwise use mapM_, foldM_, or foldM if a real reduction is
possible (i.e. not foldM snocM []).

Step 2 sometimes requires changing my design, but it's always been for
the better.  `mapM_` tends to require more pipeline composition, so
it's leveraging the language's strengths.

This has served me well, especially in IO, but in other monads as well.

John L.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-08 Thread Roman Cheplyaka

* Ertugrul Soeylemez e...@ertes.de [2011-09-07 16:20:03+0200]
 In general it's a bad idea to use mapM over IO.

Could you explain why?

Thanks.

-- 
Roman I. Cheplyaka :: http://ro-che.info/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-08 Thread Maciej Marcin Piechotka

On Fri, 2011-09-09 at 00:41 +0200, Roman Cheplyaka wrote:
 * Ertugrul Soeylemez e...@ertes.de [2011-09-07 16:20:03+0200]
  In general it's a bad idea to use mapM over IO.
 
 Could you explain why?
 
 Thanks.
 

Hmm. Isn't it explained by next sentence (For [] it will eat lots of
memory quickly and by its mere definition there is nothing you can do
about that.)?

Regards


signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-08 Thread Daniel Fischer

On Friday 09 September 2011, 00:41:11, Roman Cheplyaka wrote:
 * Ertugrul Soeylemez e...@ertes.de [2011-09-07 16:20:03+0200]
 
  In general it's a bad idea to use mapM over IO.
 
 Could you explain why?

Take it with a grain of salt, there's nothing necessarily wrong with using 
mapM over IO on short lists.
The problem is that IO's semantics imply that nothing can be made available 
before the entire list has been consumed and a large thunk is built on the 
way. Thus for longish lists there's a serious risk of stack overflows (or 
even heap exhaustion if you mapM the right [wrong] functions).
The same applies to replicateM, and to other monads with a (=) which 
isn't lazy enough.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-07 Thread Ertugrul Soeylemez

Travis Erdman traviserd...@yahoo.com wrote:

 The performance of mapM appears to be supralinear in the length of the
 list it is mapping on.  Does it need to be this way?  As a comparison,
 both mapM_ and map are linear in the length of the list.

It needs to be this way in most monads.  It's not a problem of mapM
itself, but of its definition in the particular monad.  In general it's
a bad idea to use mapM over IO.  For [] it will eat lots of memory
quickly and by its mere definition there is nothing you can do about
that.

mapM_ is linear, because it can throw away the results, so no
complicated accumulation occurs.  map is usually linear, because used
properly it will be optimized away leaving just a loop, which doesn't
produce any data structures in memory and is just run element by
element.


Greets,
Ertugrul


-- 
nightmare = unsafePerformIO (getWrongWife = sex)
http://ertes.de/



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] mapM is supralinear?

2011-09-06 Thread Travis Erdman



The performance of mapM appears to be supralinear in the length of the list it 
is mapping on.  Does it need to be this way?  As a comparison, both mapM_ and 
map are linear in the length of the list.


To wit:

travis@PW:~/Documents/insurer$ ghci
GHCi, version 7.0.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
Prelude :set +s
Prelude :set +t
Prelude :m Data.List Data.Maybe
Prelude Data.List Data.Maybe foldl' (+) 0 $ fromJust $ mapM (\x - Just x) 
[1..50]
12500025
it :: Integer
(2.23 secs, 112875864 bytes)
Prelude Data.List Data.Maybe foldl' (+) 0 $ fromJust $ mapM (\x - Just x) 
[1..100]
5050
it :: Integer
(6.86 secs, 214002204 bytes)
Prelude Data.List Data.Maybe foldl' (+) 0 $ fromJust $ mapM (\x - Just x) 
[1..200]
20100
it :: Integer
(24.39 secs, 429299748 bytes)


Prelude Data.List Data.Maybe foldl' (+) 0 $ map (\x - x - 1) [1..100]
4950
it :: Integer
(0.77 secs, 171213436 bytes)
Prelude Data.List Data.Maybe foldl' (+) 0 $ map (\x - x - 1) [1..1000]
499500
it :: Integer
(7.42 secs, 1723399472 bytes)
Prelude Data.List Data.Maybe foldl' (+) 0 $ map (\x - x - 1) [1..4000]
7998000
it :: Integer
(30.46 secs, 6894835952 bytes)



Prelude Data.List Data.Maybe mapM_ (\x - Just x) [1..100]
Just ()
it :: Maybe ()
(0.42 secs, 82761248 bytes)
Prelude Data.List Data.Maybe mapM_ (\x - Just x) [1..1000]
Just ()
it :: Maybe ()
(3.87 secs, 808012660 bytes)
Prelude Data.List Data.Maybe mapM_ (\x - Just x) [1..1]
Just ()
it :: Maybe ()
(38.40 secs, 8054769564 bytes)
Prelude Data.List Data.Maybe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

2011-09-06 Thread Daniel Fischer

On Wednesday 07 September 2011, 01:01:08, Travis Erdman wrote:
 The performance of mapM appears to be supralinear in the length of the
 list it is mapping on.

Hmm. Could reproduce with 6.12.3 and 7.0.4, but not with 7.2.1.

 Does it need to be this way?  

Apparently it doesn't, and it seems to be fixed now.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

[Haskell-cafe] mapM is supralinear?

Re: [Haskell-cafe] mapM is supralinear?

18 matches

Site Navigation

Mail list logo

Footer information