Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Milos Hasan

Hi, thanks for the reply..

Hi,

 so let's say I want to generate a list of N random floats. The elegant
 way of doing it would be to create an infinite lazy list of floats and
 take the first N, but for N = 1,000,000 or more, this overflows the
 stack. The reason is apparently that the take function is not
 tail-recursive, and so it uses O(N) stack space..



Not too likely.  take should not be tail recursive, because that is
not lazy (you have to compute all n elements to get the first one) and
thus uses O(n) space, whereas the take in the Prelude is lazy, so uses
O(1) space.  The prelude take is the one you want.

It's likely that the stack overflow is occurring elsewhere in your
program.  For example, if you are adding together all the random
numbers using foldl or foldr, that will eat up your stack (the right
solution in that case is to use the strict foldl').  Perhaps you could
post your code, or a minimal example of what you're experiencing
Summing the list works fine, it uses both O(1) stack space and O(1) heap 
space (so the laziness of foldl is not the problem here).


The problem is that I'm not just trying to sum the list, nor any similar 
producer-consumer scenario that could be done in O(1) heap space. What I 
was really trying to do is create a nearest-neighbor search tree with a 
large number of random points. So I really need the list to physically 
materialize in memory, and I don't mind it taking O(N) heap space. The 
problem I was trying to avoid was using O(N) *stack* space in the 
process of creating the list.


Here's a minimal summing example that illustrates the difference. The 
following works fine, since the elements are generated lazily and summed 
on the fly, as expected:


randFloats :: [Float]
randFloats = randoms (mkStdGen 0)

main = do
   let xs = take 100 randFloats
   print $ sum xs

But this overflows, because the list is created before being summed, and 
the take function goes into awfully deep recursion:


randFloats :: [Float]
randFloats = randoms (mkStdGen 0)

main = do
   xs - return $ take 100 randFloats
   print $ sum xs

Is there a clean way to avoid this problem?

Thanks,
Milos

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Milos Hasan

Bryan O'Sullivan wrote:

Milos Hasan wrote:

  

so let's say I want to generate a list of N random floats. The elegant
way of doing it would be to create an infinite lazy list of floats and
take the first N, but for N = 1,000,000 or more, this overflows the
stack. The reason is apparently that the take function is not
tail-recursive, and so it uses O(N) stack space..



You might want to post your code.  The reason take isn't tail recursive
is that it will be evaluated lazily, so it will not consume O(n) stack
space.

  
Yup, see the code I sent to Luke Palmer. The problem is that I really 
need the whole list to do further processing on it, instead of just 
consuming the elements one-by-one as they are produced by the random 
generator. I'm trying to build a search tree, but for simplicity you 
might assume that I just need to sort the list (or any other operation 
that is not a simple fold).



However, using take is the wrong approach anyway, as the user of the
random numbers needs to return the unconsumed portion of the list so
that the next user can consume them.  This is why code that uses random
numbers is usually written in the context of a state monad, such as
MonadRandom: http://www.haskell.org/haskellwiki/New_monads/MonadRandom
  
That sounds interesting, but I'm not sure it's the randomness that's the 
problem here. I could as well have a completely deterministic function f 
:: Int - Float, and I could try to produce the list map f 
[1..100], hitting the same issue.


Cheers,
Milos

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Milos Hasan

So, I did one more experiment, and the following overflows too:

import System.Random
import Data.List

randFloats :: [Float]
randFloats = randoms (mkStdGen 0)

main = print $ sum $ sort $ take 100 randFloats

Could it be that Data.List.sort is the culprit that uses O(n) stack 
space here? If so, is this avoidable?


Thanks a lot,
Milos



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Kalman Noel
Milos Hasan wrote:
 Here's a minimal summing example that illustrates the difference. The 
 following works fine, since the elements are generated lazily and summed 
 on the fly, as expected:
 
 randFloats :: [Float]
 randFloats = randoms (mkStdGen 0)
 
 main = do
let xs = take 100 randFloats
print $ sum xs
 
 But this overflows, because the list is created before being summed, and 
 the take function goes into awfully deep recursion:
 
 randFloats :: [Float]
 randFloats = randoms (mkStdGen 0)
 
 main = do
xs - return $ take 100 randFloats
print $ sum xs
 
 Is there a clean way to avoid this problem?

There is, and it has already been mentioned: It's the behaviour of
Prelude.sum that is bugging you. ‘sum’ will build an expression like
this, which is responsible for the stack overflow:

...(n1 + n2) + n3) + n4) + ...) + nm)
^ evaluation will start here
   ^ But this is the first addition to be performed

Instead, just use sum', which is defined just like sum, but with a
strict left fold instead of a lazy left fold:

import Data.List

sum' :: (Num a) = [a] - a
sum' = foldl' (+) 0

I don't know exactly why there is a difference between both programs.  
I suppose that in the first one, the strictness analyzer can optimize
sum into sum', but in the second one it cannot.

Kalman

--
Get a free email address with REAL anti-spam protection.
http://www.bluebottle.com/tag/1

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Luke Palmer
On Sat, Mar 1, 2008 at 8:18 AM, Milos Hasan [EMAIL PROTECTED] wrote:
  Here's a minimal summing example that illustrates the difference. The
  following works fine, since the elements are generated lazily and summed
  on the fly, as expected:

  randFloats :: [Float]
  randFloats = randoms (mkStdGen 0)

  main = do
 let xs = take 100 randFloats
 print $ sum xs

  But this overflows, because the list is created before being summed, and
  the take function goes into awfully deep recursion:

  randFloats :: [Float]
  randFloats = randoms (mkStdGen 0)

  main = do
 xs - return $ take 100 randFloats
 print $ sum xs

It is definitely the strictness analyzer biting you here.  In ghci,
the behavior of these two programs is identical (stack overflow).  As
kalman said, if you replate sum with foldl' (+) 0 in each of these
programs, the behavior is still identical (correct).

As a side note, one of the monad laws is this:

return x = f  =  f x

So your two programs are semantically equivalent (there's nothing
about IO that forces the evaluation of the list).

If you're building some sort of tree out of these values, you're going
to want to make sure that whatever fold you do (be it using foldl' or
recursion)
is strict, so that you don't get a huge thunk that doesn't have any information.

Luke
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Albert Y. C. Lai
The following is in ghci 6.8.2 with default options (e.g., default heap 
and stack). G denotes the ghci prompt.


At some points ghci will use 500MB of memory. Be sure you have enough 
physical memory.


G :m + Data.List System.Random
G let f n = take n randoms (mkStdGen 0)) :: [Float]

I define f to take a parameter so that thunks are created fresh every 
time I use it.


G sum (f 100)
*** Exception: stack overflow

This overflow is known, and its solution too:

G foldl' (+) 0 (f 100)
499985.38

The more interesting part is when we also sort:

G foldl' (+) 0 (sort (f 100))
*** Exception: stack overflow

At this point everyone hastens to assign blames according to his/her 
mental model. But some mental models are more mistaken than others. When 
a mental model is shown to be mistaken by another example, we have a 
name for the feeling that ensues: my brain has exploded. And here is 
an example that likely causes your brain to explode (it also causes ghci 
to use 500MB of memory):


G foldl' (+) 0 (sort (reverse (f 100)))
42.0

(Don't worry about the different sum. We all know that re-ordering 
floating point numbers changes the sum.)


Some brains haven't exploded yet. They believe that reverse helps by 
forcing all 100 cons cells before sorting. To burst them, let's 
reverse twice:


G foldl' (+) 0 (sort (reverse (reverse (f 100
*** Exception: stack overflow

You can also reverse 3 times, 4 times... In general, even number of 
times overflows, odd number of times works. It is now clear that order 
before sorting matters. But why should it?


I now give some hints and step down. I invite you to contemplate and 
discuss further.


* Suppose f 6 is [a,b,c,d,e,f]. What is the size of thunk f? Compare to 
the size of thunk a, b, etc. Also note how much is shared among them. If 
you want, the source code of System.Random is at


http://www.haskell.org/ghc/docs/latest/html/libraries/random/src/System-Random.html

If that is too complicated, a simpler model is available at

http://haskell.org/onlinereport/random.html

For our present purpose, the simpler model is accurate enough.

* Based on that knowledge, you're in a similar situation as

http://www.haskell.org/haskellwiki/Stack_overflow#Scans

Thus for example last (f 100) overflows, while print (f 100) 
works just fine (if you have the patience).


* Debug.Trace.trace is great for verifying relative order of evaluation. 
If I construct this list


[trace 0 a, trace 1 b, trace 2 c, ...]

and pass it to sort, I can see which item is forced in what order when 
sort compares items. This command does:


G :m + Debug.Trace
G sort (zipWith (trace . show) [0..] (f 6))

What do you see? What do you think? Bump up the number 6 to 10, 20... to 
verify what you guess.


* If you are interested in finding out in detail why sort does that, the 
source code is at


http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-List.html

Note the code of merge:

merge cmp xs [] = xs
merge cmp [] ys = ys
merge cmp (x:xs) (y:ys) = ...

Suppose you re-order the first two lines, i.e., make it

merge cmp [] ys = ys
merge cmp xs [] = xs
merge cmp (x:xs) (y:ys) = ...

what will happen?

* Prove that the sorting algorithm is only responsible for O(n log n) 
time and O(log n) stack. Thus any extra stack pressure must come from 
thunks in the list items.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Milos Hasan



It is definitely the strictness analyzer biting you here.  In ghci,
the behavior of these two programs is identical (stack overflow).  As
kalman said, if you replate sum with foldl' (+) 0 in each of these
programs, the behavior is still identical (correct).

  

OK, I could replicate that result. Awesome, thanks a lot!


As a side note, one of the monad laws is this:

return x = f  =  f x

So your two programs are semantically equivalent (there's nothing
about IO that forces the evaluation of the list).

  

OK, thanks, this is an important point. So maybe I should have done this?

main = print $ foldl1' (+) $! take 100 randFloats

My intuition tells me that the $! (and `seq`) just reduces one level (to 
WHNF?). If so, is there a way to force complete evaluation (so that 
nothing is reducible anymore)?


Thanks,
Milos

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-03-01 Thread Chaddaï Fouché
2008/3/1, Milos Hasan [EMAIL PROTECTED]:
 OK, thanks, this is an important point. So maybe I should have done this?

  main = print $ foldl1' (+) $! take 100 randFloats

  My intuition tells me that the $! (and `seq`) just reduces one level (to
  WHNF?). If so, is there a way to force complete evaluation (so that
  nothing is reducible anymore)?

In fact with this code you won't have any problem since the foldl1'
will consume strictly the elements as soon as take produce them,
avoiding any accumulation of thunk by randoms.

Now if you were to put a sort in there (supposedly to do something
else than a simple sum...), you could have a need for a function that
reduce the list and its elements :

 forceList [] = ()
 forceList (x:xs) = x `seq` forceList xs

  main = print $ foldl1' (+) $ (\xs - forceList xs `seq`  sort xs) $ take 
 100 randFloats

In Ghci it don't work (probably because the tail call in forceList
isn't optimised) but compiled it will work fine.

-- 
Jedaï
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-02-29 Thread Luke Palmer
On Sat, Mar 1, 2008 at 6:50 AM, Milos Hasan [EMAIL PROTECTED] wrote:
 Hi,

  so let's say I want to generate a list of N random floats. The elegant
  way of doing it would be to create an infinite lazy list of floats and
  take the first N, but for N = 1,000,000 or more, this overflows the
  stack. The reason is apparently that the take function is not
  tail-recursive, and so it uses O(N) stack space..

Not too likely.  take should not be tail recursive, because that is
not lazy (you have to compute all n elements to get the first one) and
thus uses O(n) space, whereas the take in the Prelude is lazy, so uses
O(1) space.  The prelude take is the one you want.

It's likely that the stack overflow is occurring elsewhere in your
program.  For example, if you are adding together all the random
numbers using foldl or foldr, that will eat up your stack (the right
solution in that case is to use the strict foldl').  Perhaps you could
post your code, or a minimal example of what you're experiencing.

Luke

  What is the right way to do this? Sure, I could write my own
  tail-recursive generator function. But this seems to be an instance of a
  more general problem - how to avoid algorithms linear in stack space
  when dealing with large lists.

  Thanks a lot!
  Milos

  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Generating a random list

2008-02-29 Thread Bryan O'Sullivan
Milos Hasan wrote:

 so let's say I want to generate a list of N random floats. The elegant
 way of doing it would be to create an infinite lazy list of floats and
 take the first N, but for N = 1,000,000 or more, this overflows the
 stack. The reason is apparently that the take function is not
 tail-recursive, and so it uses O(N) stack space..

You might want to post your code.  The reason take isn't tail recursive
is that it will be evaluated lazily, so it will not consume O(n) stack
space.

However, using take is the wrong approach anyway, as the user of the
random numbers needs to return the unconsumed portion of the list so
that the next user can consume them.  This is why code that uses random
numbers is usually written in the context of a state monad, such as
MonadRandom: http://www.haskell.org/haskellwiki/New_monads/MonadRandom

b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe