Re: [Haskell-cafe] Space and time leaks

2007-10-04 Thread Dan Weston

Ronald Guida wrote:

I need some help with space and time leaks.

I know of two types of space leak.  The first type of leak occurs when
a function uses unnecessary stack or heap space.

GHCi sum [1..10^6]
*** Exception: stack overflow

Apparently, the default definition for sum has a space leak.
I can define my own sum in terms of strict foldl ...

  sum' xs = foldl' (+) 0 xs

... and it doesn't overflow the stack.

GHCi sum' [1..10^6]
5050
(0.27 secs, 112403416 bytes)


But it fills the heap? My intuition is that

  foldr (+) 0 [1..10^6]

is the fusion of a good producer and consumer so no heap is wasted 
constructing [1..10^6] but the stack is instead filled with 
(1+),(2+),...,(99+) unary functions, whereas


  foldl' (+) 0 [1..10^6]

does not fill the stack with anything but does fill the heap with 
[1..10^6] because foldl' is no longer a good consumer?


If so, it seems that the stack is smaller than the heap (or else the 
size of (1+) is much larger than that the element of an [Int].


Anyone know the truth of any of this?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Space and time leaks

2007-10-03 Thread Ronald Guida

I need some help with space and time leaks.

I know of two types of space leak.  The first type of leak occurs when
a function uses unnecessary stack or heap space.

GHCi sum [1..10^6]
*** Exception: stack overflow

Apparently, the default definition for sum has a space leak.
I can define my own sum in terms of strict foldl ...

 sum' xs = foldl' (+) 0 xs

... and it doesn't overflow the stack.

GHCi sum' [1..10^6]
5050
(0.27 secs, 112403416 bytes)

GHCi sum' [1..10^7]
500500
(2.73 secs, 1161223384 bytes)

GHCi sum' [1..10^8]
50005000
(27.83 secs, 11645261144 bytes)

I think there's still a space leak; I don't understand why GHCi using
10^8, 10^9, 10^10 bytes of memory for these calculations.

The other type of space leak I know of is when I have a chunk of data
in memory that I no longer need, such that the data is still being
referenced somewhere.  Since, in theory, I might still access the
data, the garbage collector can't reclaim the memory.  I'm not sure
how to construct an example though.

Regarding time leaks, I only know of one kind of leak.  If I have a
calculation that accumulates data over time, and I don't ask for any
results until the end, then, due to laziness, that calculation might
accumulate a chain of unevaluated thunks.  When I get the the end and
demand the final result, I have to wait for it while the RTS
evaluates a long chain of thunks.

The chain of unevaluated thunks is a space leak.  The time leak occurs
because the capture process and the accumulate process are supposed to
be interleaved, such that I perform some computations after I capture
each piece of data.  If I have to wait around at the end, then it
means the capture process and the accumulate process happened in
sequence.

As far as I know, every time leak has a companion space leak; it's not
possible to create a time leak without a space leak to go with it.  Is
this really true?

Now for the hard questions.
1. How do I go about detecting space and time leaks?
2. Once I find a leak, how do I fix it?
3. Are there any programming techniques I can use to avoid leaks?

-- Ron

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe