Re: [Haskell-cafe] Re: Space usage problems
Hi Bulat, On Wed, Jan 18, 2006 at 12:10:45AM +0300, Bulat Ziganshin wrote: > > Monday, January 16, 2006, 12:52:42 AM, you wrote: > > IL> OK, I have one library which provides > > IL> inflate :: [Word8] -- The input > IL> -> ([Word8], -- A prefix of the input inflated (uncompressed) > IL> [Word8]) -- The remainder of the input > > you can use strict state monad for this task. But then none of the output would be available until all the input was consumed, right? > in particular, looking at your code in MissingH 0.13, i recommend you > try to use DiffUArray instead of Array The code in missingh is old, entirely unoptimised and quite possibly slightly buggy, incidentally. Thanks Ian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Re: Space usage problems
Hello Ian, Monday, January 16, 2006, 12:52:42 AM, you wrote: IL> OK, I have one library which provides IL> inflate :: [Word8] -- The input IL> -> ([Word8], -- A prefix of the input inflated (uncompressed) IL> [Word8]) -- The remainder of the input you can use strict state monad for this task. "strictness" here designate that it evaluates higher level of state, but it don't fully evaluates the state (which itslef is impossible without using DeepSeq class technique). so, for example, when you perform something like "state = tail state" the lazy state monad may store call to tail function in the state field and not evaluate it before needed, while strict state monad will evaluate this higher-level expression and store exactly the lazy expression that represents remainder of list it's interesting that your work has a lot of common with my Binary/Streams library. i'm used the same monadic types to define char enocding/decoding routines (see DataCharEncoding.hs), i've defined universal "mutable references" interface to transparently work with variables in IO/ST monads, i've extended John's StringReader/StringBuffer types to work in any monad. hust now i'm working on extending my "ByteStream->BitStream" transformer to also support any monads although i'm not sure that Haskell implementation of inflate/deflate algorithms will be a useful (just because it will be 100-1000 times slower than existing C routines), nevertheless i glad to offer my help, especially in optimizing code and making it monad-neutral in particular, looking at your code in MissingH 0.13, i recommend you try to use DiffUArray instead of Array IL> while (some input left) read header call inflate read IL> footer return (concat all the inflate results) IL> Reading headers is a fiddly enough task that passing the input around by IL> hand is undesirable. btw, "reading headers" is an perfect task for my Binary library. the ony difference is what in your library "getBits" is an built-in operation of special InfM monad, while in my library "getBits" operation is applied to the Stream objects, but nevertheless work in any monad (to be exact, it works in the monad, to which this Stream obect belongs): getBits :: (Monad m, Stream m h) => Int -> h -> m Int "getBits bits h" returns 'bits' bits read from stream 'h' most of types supporting Stream interface, works only in IO monad (including Handle and MemoryBuffer), but at least StringReader & StringBuffer types can be specialized to IO and ST monads and, in general, to any monad, which is able to support readRef/writeRef operations if you want, you can try to implement the whole inflate process on top of my Binary/Streams library, using these `getBits` operations. something like these: -- 's' is String containing your input data runST (do h <- openBitAlignedLE =<< newStringReader s ... -- here you can use `getBits bits h` to read `s` as bits sequence return ...) ps: http://freearc.narod.ru/Binary.tar.gz -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Space usage problems
On Wed, Jan 11, 2006 at 03:00:45PM +, Simon Marlow wrote: > Ian Lynagh wrote: > >On Wed, Jan 11, 2006 at 10:36:47AM +, Simon Marlow wrote: > > > >>My suggestion: don't use the lazy state monad if you can help it. > > > >But a strict state monad would force everything to be loaded into memory > >at once, right? > > > >What would you suggest I use instead? > > I'm not sure - can you describe exactly what you want to do from a > higher level? It might help to re-think the problem from the top down. OK, I have one library which provides inflate :: [Word8] -- The input -> ([Word8], -- A prefix of the input inflated (uncompressed) [Word8]) -- The remainder of the input I can't tell how much of the input I'll be inflating in advance, I only know when I reach the end of the compressed part. Inflating has an array and a few other bits of state while it uncompresses the input. (I'm assuming the inflation won't fail for now; later I might want something like inflate :: [Word8] -- The input -> ([Word8], -- A prefix of the input, inflated (uncompressed) [Word8], -- The remainder of the input Bool)-- Inflation failed where you would need to write the inflated data to a file, say, before checking the Bool to see if there was an error (if you want to work in constant space)). I'm happy to have a different type for inflate if necessary (e.g. inflate :: m [Word8] -> ([Word8] -> m ()) -> m [Word8] where inflate uses the Monad of the caller to read and write the remaining input; this leads to something using a monad transformer for inflates other state, along the lines of Test2 in my original message, which lead to a stack overflow) but it shouldn't be wedded to the following library: I then have another library with a function that does: while (some input left) read header call inflate read footer return (concat all the inflate results) Reading headers is a fiddly enough task that passing the input around by hand is undesirable. Thanks Ian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Space usage problems
Ian Lynagh wrote: On Wed, Jan 11, 2006 at 10:36:47AM +, Simon Marlow wrote: My suggestion: don't use the lazy state monad if you can help it. But a strict state monad would force everything to be loaded into memory at once, right? What would you suggest I use instead? I'm not sure - can you describe exactly what you want to do from a higher level? It might help to re-think the problem from the top down. Or do I just have to tread carefully to keep this optimisation happy until the GCer is improved? I can't see us fixing this in the short term, I'm afraid. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Space usage problems
On Wed, Jan 11, 2006 at 10:36:47AM +, Simon Marlow wrote: > > My suggestion: don't use the lazy state monad if you can help it. But a strict state monad would force everything to be loaded into memory at once, right? What would you suggest I use instead? Or do I just have to tread carefully to keep this optimisation happy until the GCer is improved? Thanks Ian ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Space usage problems
Ian Lynagh wrote: On Tue, Jan 10, 2006 at 04:44:33PM +, Ian Lynagh wrote: readChunks :: FirstMonad String readChunks = do xs <- get if null xs then return [] else do let (ys, zs) = foo xs put zs rest <- readChunks return (ys ++ rest) It looks like changing this let to a case fixes this example, but at the time I'd experimented with that there must have been other issues clouding the effect, such as the following. Foo1 (attached) uses large amounts of memory whereas Foo2 (also attached) runs in a little constant space. The difference is only changing this: else do chunk <- case foo xs of (ys, zs) -> do put zs return ys chunks <- readChunks return (chunk ++ chunks) to this: else case foo xs of (ys, zs) -> do put zs chunks <- readChunks return (ys ++ chunks) I had great difficulty understanding this, but I think I do now. It's a bit easier to understand if you inline the monads away. Foo1 translates to this: bar [] = ([],[]) bar (x:xs) = let (zs,ys) = bar xs in (x:zs,ys) readChunks [] = ([],[]) readChunks xs = let (ys,zs) = bar xs (chunks,rest) = readChunks zs in (ys ++ chunks, rest) and Foo2: readChunks [] = ([],[]) readChunks xs = case bar xs of (zs,ys) -> let (chunks,rest) = readChunks ys in (zs ++ chunks, rest) This is pretty much what GHC ends up with when you give -O (actually it turns some of the tuples into unboxed tuples, but that's not important). We can see in Foo1 that chunks is a thunk holding on to zs, which is a thunk that holds on to xs, so you never get to release xs until the whole result list (ys) is traversed. GHC's lazy tuple optimisation doesn't kick in, because neither chunks nor rest are evaluated. However, it's not so clear why Foo2 is better. chunks holds on to ys, the second of the pair returned by bar. In fact, ys will point to a chain of thunks that looks like this: ys = snd (_, snd (_, snd (_, snd (_, snd ... every time GC runs, it can completely eliminate this list via the well-known lazy tuple optimisation. Unfortunately it doesn't *completely* eliminate the list, because of a shortcoming in our implementation, actually reported earlier by Ian Lynagh with a very similar program :-) Fortunately in this example we do seem to be reducing enough of the list to eliminate the space leak, though. My suggestion: don't use the lazy state monad if you can help it. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe