Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-11 Thread mf-hcafe-15c311f0c

On Sat, Oct 10, 2009 at 11:11:24PM +0200, Daniel Fischer wrote:
> To: haskell-cafe@haskell.org
> From: Daniel Fischer 
> Date: Sat, 10 Oct 2009 23:11:24 +0200
> Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
> 
> Am Samstag 10 Oktober 2009 22:14:38 schrieb mf-hcafe-15c311...@etc-network.de:
> > On Sat, Oct 10, 2009 at 09:33:52AM -0700, Thomas Hartman wrote:
> > > To: Luke Palmer 
> > > Cc: mf-hcafe-15c311...@etc-network.de, haskell-cafe@haskell.org
> > > From: Thomas Hartman 
> > > Date: Sat, 10 Oct 2009 09:33:52 -0700
> > > Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
> > >
> > > > Yes, you should not do this in IO.  That requires the entire
> > > > computation to finish before the result can be used.
> > >
> > > Not really the entire computation though... whnf, no?
> >
> > In that example, yes.  But readFile takes the entire file into a
> > strict String before it gives you the first Char, right?  (Sorry again
> > for my misleading code "simplification".)
> 
> No, readFile reads the file lazily.

hm?  oh, you are right, now that i fixed all the other problems in my
code readFile isn't a problem any more either...  (-:

(but then how does it know when to close the handle?  gotta go read
the code i guess.)

thanks!
-m
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-10 Thread Daniel Fischer
Am Samstag 10 Oktober 2009 22:14:38 schrieb mf-hcafe-15c311...@etc-network.de:
> On Sat, Oct 10, 2009 at 09:33:52AM -0700, Thomas Hartman wrote:
> > To: Luke Palmer 
> > Cc: mf-hcafe-15c311...@etc-network.de, haskell-cafe@haskell.org
> > From: Thomas Hartman 
> > Date: Sat, 10 Oct 2009 09:33:52 -0700
> > Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
> >
> > > Yes, you should not do this in IO.  That requires the entire
> > > computation to finish before the result can be used.
> >
> > Not really the entire computation though... whnf, no?
>
> In that example, yes.  But readFile takes the entire file into a
> strict String before it gives you the first Char, right?  (Sorry again
> for my misleading code "simplification".)

No, readFile reads the file lazily.

>
> > main = do
> >   let thunks :: IO [Int]
> >   thunks = (sequence . replicate (10^6) $ (randomRIO (0,10^9)))
> >   putStrLn . show . head =<< thunks -- prints
> >   putStrLn . show . last =<< thunks -- overflows
>
> Meaning that the entire list needs to be kept?  Is there a reason
> (other than "it's easier to implement and it's legal" :-) why the
> elements that have been traversed by "last" can't be garbage
> collected?
>

The problem is that the randomRIO isn't done before it's needed. When you ask 
for the last 
element of the generated list, you have a stack of nearly one million calls to 
randomRIO 
to get it, that overflows the stack.
If you insert a stricter version of sequence:

{-# LANGUAGE BangPatterns #-}

sequence'   :: Monad m => [m a] -> m [a]
{-# INLINE sequence' #-}
sequence' ms = foldr k (return []) ms
where
  k m m' = do { !x <- m; xs <- m'; return (x:xs) }
-- ^^^ evaluate x now!

main = do
let thunks = sequence' . replicate (10^6) $ randomRIO (0,10^9)
...

it doesn't overflow the stack. But both, sequence and sequence' must construct 
the entire 
list, so they use quite a bit of memory.
You can keep the memory usage low by using unsafeInterleaveIO.

>
>
> -m


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-10 Thread mf-hcafe-15c311f0c
On Sat, Oct 10, 2009 at 09:33:52AM -0700, Thomas Hartman wrote:
> To: Luke Palmer 
> Cc: mf-hcafe-15c311...@etc-network.de, haskell-cafe@haskell.org
> From: Thomas Hartman 
> Date: Sat, 10 Oct 2009 09:33:52 -0700
> Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
> 
> > Yes, you should not do this in IO.  That requires the entire
> > computation to finish before the result can be used.
> 
> Not really the entire computation though... whnf, no?

In that example, yes.  But readFile takes the entire file into a
strict String before it gives you the first Char, right?  (Sorry again
for my misleading code "simplification".)

> main = do
>   let thunks :: IO [Int]
>   thunks = (sequence . replicate (10^6) $ (randomRIO (0,10^9)))
>   putStrLn . show . head =<< thunks -- prints
>   putStrLn . show . last =<< thunks -- overflows

Meaning that the entire list needs to be kept?  Is there a reason
(other than "it's easier to implement and it's legal" :-) why the
elements that have been traversed by "last" can't be garbage
collected?



-m
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-10 Thread mf-hcafe-15c311f0c

On Fri, Oct 09, 2009 at 05:48:15PM -0600, Luke Palmer wrote:
> To: mf-hcafe-15c311...@etc-network.de
> Cc: 
> From: Luke Palmer 
> Date: Fri, 9 Oct 2009 17:48:15 -0600
> Subject: Re: [Haskell-cafe] How do I get this done in constant mem?
> 
> On Fri, Oct 9, 2009 at 2:05 PM,   wrote:
> > Hi all,
> >
> > I think there is something about my use of the IO monad that bites me,
> > but I am bored of staring at the code, so here you g.  The code goes
> > through a list of records and collects the maximum in each record
> > position.
> >
> >
> > -- test.hs
> > import Random
> > import System.Environment (getArgs)
> > import System.IO (putStr)
> >
> > samples :: Int -> Int -> IO [[Double]]
> > samples i j = sequence . replicate i . sequence . replicate j $ randomRIO 
> > (0, 1000 ** 3)
> 
> Yes, you should not do this in IO.  That requires the entire
> computation to finish before the result can be used.  This computation
> should be pure and lazy.

Yeah.  I also got an excellent reason via private mail why sequence
has to be strict:

sequence [Maybe 3, Maybe 4, Nothing]  = Nothing
sequence [Maybe 3, Maybe 4]   = Just [3, 4]

> > maxima :: [[Double]] -> [Double]
> > maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
> > samples) (tail samples)
> 
> FWIW, This function has a beautiful alternate definition:
> 
> maxima :: [[Double]] -> [Double]
> maxima = map maximum . transpose

Beautiful indeed!  But see below.

To be honest, I don't really roll dice, but I am reading from a file.
I just thought that randomRIO would be more concise, but now the
discussion has gone totally in that direction.  Sorry...  (-: reading
the random number code is more fun, though!

Anyhow, I fixed my example to do lazy file processing where before I
used readFile (which has to be strict, as I can see now).  First, I
generate a file with the samples, and then I read that file back (this
is the phase I'm interested in, since my real data is not really
random numbers).


import List
import Monad
import Random
import System.Environment
import System.IO

samples :: Int -> Int -> IO [[Int]]
samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 
1000 * 1000 * 1000)

maxima :: [[Int]] -> [Int]
maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
samples) (tail samples)

lazyProcess :: ([[Int]] -> a) -> FilePath -> IO a
lazyProcess f fileName =
do
  h <- openFile fileName ReadMode
  v <- fmap (f . map read . lines) $ hGetContents h
  v `seq` hClose h
  return v

mkSamples = do
  args <- getArgs
  x <- samples (read (head args)) 5
  putStr . (++ "\n") . join . intersperse "\n" . map show $ x

-- main = mkSamples
-- ghc --make -O9 test.hs -o test && ./test 1 > test.data

main = lazyProcess length "test.data" >>= putStr . show


lazyProcess (What would be a better name?  foldSampleFile perhaps?) is
where the IO happens, but the computation is located in a pure
function.  And yet, only those lines are read that are relevant, and
GC on previous lines is allows if the pure function allows it.

This program has constant memory usage.  Unfortunately, if I replace
the length function with implementation of maxima, it explodes again.
I tried a few things, such as

maxima'3 :: [[Int]] -> [Int]
maxima'3 (h:t) = foldr (\ x y -> let v = map (uncurry max) $ zip x y in sum v 
`seq` v) h t

with no luck so far.  Tricky business, that!  But much more curiously,
if I replace maxima'3 in main with this

maxima'4 :: [[Int]] -> [Int]
maxima'4 = map maximum . transpose

(with explicit type signature in both definitions), I get a 'no parse'
error from Prelude.read.  maxima'3 with the same file gives me a
result.  How can there be a difference if the type signatures are
identical?!

Probably something about "don't use Prelude.read" :-)?  I have to play
with this some more...



matthias
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-10 Thread Thomas Hartman
also, looking at the following, it does seem to me that it is sequence
that is too strict, and not IO that is to blame, as the Maybe monad
has the same behavior:

t5IO, t6IO :: IO Int
t5Maybe, t6Maybe :: Maybe Int
t5 = return . head =<< sequence [return 1, undefined]
t6 = return . head =<< return [1,undefined]
t5IO = t5
t5Maybe = t5
t6IO = t6
t6Maybe = t6

*Main> t5IO
*** Exception: Prelude.undefined
*Main> t5Maybe
*** Exception: Prelude.undefined
*Main> t6IO
1
*Main> t6Maybe
Just 1

2009/10/10 Thomas Hartman :
>> Yes, you should not do this in IO.  That requires the entire
>> computation to finish before the result can be used.
>
> Not really the entire computation though... whnf, no?
>
> main = do
>  let thunks :: IO [Int]
>  thunks = (sequence . replicate (10^6) $ (randomRIO (0,10^9)))
>  putStrLn . show . head =<< thunks -- prints
>  putStrLn . show . last =<< thunks -- overflows
>
> In the case of [[num]] from the top post, I belive that would be the
> first complete list.
>
>
> 2009/10/9 Luke Palmer :
>> On Fri, Oct 9, 2009 at 2:05 PM,   wrote:
>>> Hi all,
>>>
>>> I think there is something about my use of the IO monad that bites me,
>>> but I am bored of staring at the code, so here you g.  The code goes
>>> through a list of records and collects the maximum in each record
>>> position.
>>>
>>>
>>> -- test.hs
>>> import Random
>>> import System.Environment (getArgs)
>>> import System.IO (putStr)
>>>
>>> samples :: Int -> Int -> IO [[Double]]
>>> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO 
>>> (0, 1000 ** 3)
>>
>> Yes, you should not do this in IO.  That requires the entire
>> computation to finish before the result can be used.  This computation
>> should be pure and lazy.
>>
>> It is possible, using split (and I believe not without it, unless you
>> use mkStdGen), to make a 2D list of randoms where the random
>> generation matches exactly the structure of the list.
>>
>> splits :: (RandomGen g) => Int -> g -> [g]
>> splits 0 _ = []
>> splits n g = let (g1,g2) = split g in g1 : splits (n-1) g2
>>
>> samples :: (RandomGen g) => Int -> Int -> g -> [[Double]]
>> samples i j gen = map row (splits i gen)
>>    where
>>    row g = take j (randomRs (0, 10^9) g)
>>
>> In fact, we could omit all these counts and make an infinite 2D list,
>> which you can cull in the client code.
>>
>> splits :: (RandomGen g) => g -> [g]
>> splits g = let (g1,g2) = split g in g1 : splits g2
>>
>> samples :: (RandomGen g) => g -> [[Double]]
>> samples = map row . splits
>>    where
>>    row = randomRs (0, 10^9)
>>
>> I find the latter to be more straightforward and obvious.  Maintaining
>> the laziness here is a fairly subtle thing, so study, perturb, try to
>> write it yourself in different ways, etc.
>>
>>> maxima :: [[Double]] -> [Double]
>>> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
>>> samples) (tail samples)
>>
>> FWIW, This function has a beautiful alternate definition:
>>
>> maxima :: [[Double]] -> [Double]
>> maxima = map maximum . transpose
>>
>>> main = do
>>>  args <- getArgs
>>>  x <- samples (read (head args)) 5
>>>  putStr . (++ "\n") . show $ maxima x
>>>
>>>
>>> I would expect this to take constant memory (foldr as well as foldl),
>>> but this is what happens:
>>>
>>>
>>> $ ghc -prof --make -O9 -o test test.hs
>>> [1 of 1] Compiling Main             ( test.hs, test.o )
>>> Linking test ...
>>> $ ./test 100 +RTS -p
>>> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
>>> $ grep 'total alloc' test.prof
>>>        total alloc =     744,180 bytes  (excludes profiling overheads)
>>> $ ./test 1 +RTS -p
>>> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
>>> $ grep 'total alloc' test.prof
>>>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
>>> $ ./test 100 +RTS -p
>>> Stack space overflow: current size 8388608 bytes.
>>> Use `+RTS -Ksize' to increase it.
>>> $
>>>
>>>
>>> so...
>>>
>>> does sequence somehow force the entire list of monads into evaluation
>>> before the head of the result list can be used?
>>
>> Yep.  IO is completely strict; in some sense the same as "call by
>> value" (don't take the analogy too far).  Rule of thumb: keep your
>> distance from it ;-)
>> ___
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-10 Thread Thomas Hartman
> Yes, you should not do this in IO.  That requires the entire
> computation to finish before the result can be used.

Not really the entire computation though... whnf, no?

main = do
  let thunks :: IO [Int]
  thunks = (sequence . replicate (10^6) $ (randomRIO (0,10^9)))
  putStrLn . show . head =<< thunks -- prints
  putStrLn . show . last =<< thunks -- overflows

In the case of [[num]] from the top post, I belive that would be the
first complete list.


2009/10/9 Luke Palmer :
> On Fri, Oct 9, 2009 at 2:05 PM,   wrote:
>> Hi all,
>>
>> I think there is something about my use of the IO monad that bites me,
>> but I am bored of staring at the code, so here you g.  The code goes
>> through a list of records and collects the maximum in each record
>> position.
>>
>>
>> -- test.hs
>> import Random
>> import System.Environment (getArgs)
>> import System.IO (putStr)
>>
>> samples :: Int -> Int -> IO [[Double]]
>> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO 
>> (0, 1000 ** 3)
>
> Yes, you should not do this in IO.  That requires the entire
> computation to finish before the result can be used.  This computation
> should be pure and lazy.
>
> It is possible, using split (and I believe not without it, unless you
> use mkStdGen), to make a 2D list of randoms where the random
> generation matches exactly the structure of the list.
>
> splits :: (RandomGen g) => Int -> g -> [g]
> splits 0 _ = []
> splits n g = let (g1,g2) = split g in g1 : splits (n-1) g2
>
> samples :: (RandomGen g) => Int -> Int -> g -> [[Double]]
> samples i j gen = map row (splits i gen)
>    where
>    row g = take j (randomRs (0, 10^9) g)
>
> In fact, we could omit all these counts and make an infinite 2D list,
> which you can cull in the client code.
>
> splits :: (RandomGen g) => g -> [g]
> splits g = let (g1,g2) = split g in g1 : splits g2
>
> samples :: (RandomGen g) => g -> [[Double]]
> samples = map row . splits
>    where
>    row = randomRs (0, 10^9)
>
> I find the latter to be more straightforward and obvious.  Maintaining
> the laziness here is a fairly subtle thing, so study, perturb, try to
> write it yourself in different ways, etc.
>
>> maxima :: [[Double]] -> [Double]
>> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
>> samples) (tail samples)
>
> FWIW, This function has a beautiful alternate definition:
>
> maxima :: [[Double]] -> [Double]
> maxima = map maximum . transpose
>
>> main = do
>>  args <- getArgs
>>  x <- samples (read (head args)) 5
>>  putStr . (++ "\n") . show $ maxima x
>>
>>
>> I would expect this to take constant memory (foldr as well as foldl),
>> but this is what happens:
>>
>>
>> $ ghc -prof --make -O9 -o test test.hs
>> [1 of 1] Compiling Main             ( test.hs, test.o )
>> Linking test ...
>> $ ./test 100 +RTS -p
>> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
>> $ grep 'total alloc' test.prof
>>        total alloc =     744,180 bytes  (excludes profiling overheads)
>> $ ./test 1 +RTS -p
>> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
>> $ grep 'total alloc' test.prof
>>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
>> $ ./test 100 +RTS -p
>> Stack space overflow: current size 8388608 bytes.
>> Use `+RTS -Ksize' to increase it.
>> $
>>
>>
>> so...
>>
>> does sequence somehow force the entire list of monads into evaluation
>> before the head of the result list can be used?
>
> Yep.  IO is completely strict; in some sense the same as "call by
> value" (don't take the analogy too far).  Rule of thumb: keep your
> distance from it ;-)
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-10 Thread Thomas Hartman
I don't know if this counts but how about

import Control.Applicative
import Control.Monad
import Random
import Data.List

main'' i j = replicateM j $ maximum' <$> (replicateM i . randomRIO $ (0,10^9))
maximum' = foldl1' max
t = main'' (10^4) 5


2009/10/9  :
>
>
> Hi all,
>
> I think there is something about my use of the IO monad that bites me,
> but I am bored of staring at the code, so here you g.  The code goes
> through a list of records and collects the maximum in each record
> position.
>
>
> -- test.hs
> import Random
> import System.Environment (getArgs)
> import System.IO (putStr)
>
> samples :: Int -> Int -> IO [[Double]]
> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 
> 1000 ** 3)
>
> maxima :: [[Double]] -> [Double]
> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
> samples) (tail samples)
>
> main = do
>  args <- getArgs
>  x <- samples (read (head args)) 5
>  putStr . (++ "\n") . show $ maxima x
>
>
> I would expect this to take constant memory (foldr as well as foldl),
> but this is what happens:
>
>
> $ ghc -prof --make -O9 -o test test.hs
> [1 of 1] Compiling Main             ( test.hs, test.o )
> Linking test ...
> $ ./test 100 +RTS -p
> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
> $ grep 'total alloc' test.prof
>        total alloc =     744,180 bytes  (excludes profiling overheads)
> $ ./test 1 +RTS -p
> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
> $ grep 'total alloc' test.prof
>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
> $ ./test 100 +RTS -p
> Stack space overflow: current size 8388608 bytes.
> Use `+RTS -Ksize' to increase it.
> $
>
>
> so...
>
> does sequence somehow force the entire list of monads into evaluation
> before the head of the result list can be used?  what can i do to
> implement this in constant memory?
>
> thanks!
> matthias
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How do I get this done in constant mem?

2009-10-09 Thread Luke Palmer
On Fri, Oct 9, 2009 at 2:05 PM,   wrote:
> Hi all,
>
> I think there is something about my use of the IO monad that bites me,
> but I am bored of staring at the code, so here you g.  The code goes
> through a list of records and collects the maximum in each record
> position.
>
>
> -- test.hs
> import Random
> import System.Environment (getArgs)
> import System.IO (putStr)
>
> samples :: Int -> Int -> IO [[Double]]
> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 
> 1000 ** 3)

Yes, you should not do this in IO.  That requires the entire
computation to finish before the result can be used.  This computation
should be pure and lazy.

It is possible, using split (and I believe not without it, unless you
use mkStdGen), to make a 2D list of randoms where the random
generation matches exactly the structure of the list.

splits :: (RandomGen g) => Int -> g -> [g]
splits 0 _ = []
splits n g = let (g1,g2) = split g in g1 : splits (n-1) g2

samples :: (RandomGen g) => Int -> Int -> g -> [[Double]]
samples i j gen = map row (splits i gen)
where
row g = take j (randomRs (0, 10^9) g)

In fact, we could omit all these counts and make an infinite 2D list,
which you can cull in the client code.

splits :: (RandomGen g) => g -> [g]
splits g = let (g1,g2) = split g in g1 : splits g2

samples :: (RandomGen g) => g -> [[Double]]
samples = map row . splits
where
row = randomRs (0, 10^9)

I find the latter to be more straightforward and obvious.  Maintaining
the laziness here is a fairly subtle thing, so study, perturb, try to
write it yourself in different ways, etc.

> maxima :: [[Double]] -> [Double]
> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
> samples) (tail samples)

FWIW, This function has a beautiful alternate definition:

maxima :: [[Double]] -> [Double]
maxima = map maximum . transpose

> main = do
>  args <- getArgs
>  x <- samples (read (head args)) 5
>  putStr . (++ "\n") . show $ maxima x
>
>
> I would expect this to take constant memory (foldr as well as foldl),
> but this is what happens:
>
>
> $ ghc -prof --make -O9 -o test test.hs
> [1 of 1] Compiling Main             ( test.hs, test.o )
> Linking test ...
> $ ./test 100 +RTS -p
> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
> $ grep 'total alloc' test.prof
>        total alloc =     744,180 bytes  (excludes profiling overheads)
> $ ./test 1 +RTS -p
> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
> $ grep 'total alloc' test.prof
>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
> $ ./test 100 +RTS -p
> Stack space overflow: current size 8388608 bytes.
> Use `+RTS -Ksize' to increase it.
> $
>
>
> so...
>
> does sequence somehow force the entire list of monads into evaluation
> before the head of the result list can be used?

Yep.  IO is completely strict; in some sense the same as "call by
value" (don't take the analogy too far).  Rule of thumb: keep your
distance from it ;-)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] How do I get this done in constant mem?

2009-10-09 Thread mf-hcafe-15c311f0c


Hi all,

I think there is something about my use of the IO monad that bites me,
but I am bored of staring at the code, so here you g.  The code goes
through a list of records and collects the maximum in each record
position.


-- test.hs
import Random
import System.Environment (getArgs)
import System.IO (putStr)

samples :: Int -> Int -> IO [[Double]]
samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 
1000 ** 3)

maxima :: [[Double]] -> [Double]
maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head 
samples) (tail samples)

main = do
  args <- getArgs
  x <- samples (read (head args)) 5
  putStr . (++ "\n") . show $ maxima x


I would expect this to take constant memory (foldr as well as foldl),
but this is what happens:


$ ghc -prof --make -O9 -o test test.hs
[1 of 1] Compiling Main ( test.hs, test.o )
Linking test ...
$ ./test 100 +RTS -p
[9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
$ grep 'total alloc' test.prof 
total alloc = 744,180 bytes  (excludes profiling overheads)
$ ./test 1 +RTS -p
[9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
$ grep 'total alloc' test.prof 
total alloc =  64,777,692 bytes  (excludes profiling overheads)
$ ./test 100 +RTS -p
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.
$ 


so...

does sequence somehow force the entire list of monads into evaluation
before the head of the result list can be used?  what can i do to
implement this in constant memory?

thanks!
matthias
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe