Re: [Haskell-cafe] Re: Space usage problems

2006-01-17 Thread Ian Lynagh

Hi Bulat,

On Wed, Jan 18, 2006 at 12:10:45AM +0300, Bulat Ziganshin wrote:
> 
> Monday, January 16, 2006, 12:52:42 AM, you wrote:
> 
> IL> OK, I have one library which provides
> 
> IL> inflate :: [Word8]   -- The input
> IL> -> ([Word8], -- A prefix of the input inflated (uncompressed)
> IL> [Word8]) -- The remainder of the input
> 
> you can use strict state monad for this task.

But then none of the output would be available until all the input was
consumed, right?

> in particular, looking at your code in MissingH 0.13, i recommend you
> try to use DiffUArray instead of Array

The code in missingh is old, entirely unoptimised and quite possibly
slightly buggy, incidentally.


Thanks
Ian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Re: Space usage problems

2006-01-17 Thread Bulat Ziganshin
Hello Ian,

Monday, January 16, 2006, 12:52:42 AM, you wrote:

IL> OK, I have one library which provides

IL> inflate :: [Word8]   -- The input
IL> -> ([Word8], -- A prefix of the input inflated (uncompressed)
IL> [Word8]) -- The remainder of the input

you can use strict state monad for this task. "strictness" here
designate that it evaluates higher level of state, but it don't fully
evaluates the state (which itslef is impossible without using DeepSeq
class technique). so, for example, when you perform something like
"state = tail state" the lazy state monad may store call to tail
function in the state field and not evaluate it before needed, while
strict state monad will evaluate this higher-level expression and
store exactly the lazy expression that represents remainder of list

it's interesting that your work has a lot of common with my
Binary/Streams library. i'm used the same monadic types to define char
enocding/decoding routines (see DataCharEncoding.hs), i've defined
universal "mutable references" interface to transparently work with
variables in IO/ST monads, i've extended John's
StringReader/StringBuffer types to work in any monad. hust now i'm
working on extending my "ByteStream->BitStream" transformer to also
support any monads

although i'm not sure that Haskell implementation of inflate/deflate
algorithms will be a useful (just because it will be 100-1000 times
slower than existing C routines), nevertheless i glad to offer my
help, especially in optimizing code and making it monad-neutral

in particular, looking at your code in MissingH 0.13, i recommend you
try to use DiffUArray instead of Array

IL> while (some input left) read header call inflate read
IL> footer return (concat all the inflate results)

IL> Reading headers is a fiddly enough task that passing the input around by
IL> hand is undesirable.

btw, "reading headers" is an perfect task for my Binary library. the
ony difference is what in your library "getBits" is an built-in
operation of special InfM monad, while in my library "getBits"
operation is applied to the Stream objects, but nevertheless work in
any monad (to be exact, it works in the monad, to which this Stream
obect belongs):

getBits :: (Monad m, Stream m h) => Int -> h -> m Int

"getBits bits h" returns 'bits' bits read from stream 'h'

most of types supporting Stream interface, works only in IO monad
(including Handle and MemoryBuffer), but at least StringReader &
StringBuffer types can be specialized to IO and ST monads and, in
general, to any monad, which is able to support readRef/writeRef
operations

if you want, you can try to implement the whole inflate process on top
of my Binary/Streams library, using these `getBits` operations.
something like these:

-- 's' is String containing your input data
runST (do h <- openBitAlignedLE =<< newStringReader s
  ... -- here you can use `getBits bits h` to read `s` as bits sequence
  return ...)

ps: http://freearc.narod.ru/Binary.tar.gz

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Space usage problems

2006-01-15 Thread Ian Lynagh
On Wed, Jan 11, 2006 at 03:00:45PM +, Simon Marlow wrote:
> Ian Lynagh wrote:
> >On Wed, Jan 11, 2006 at 10:36:47AM +, Simon Marlow wrote:
> >
> >>My suggestion: don't use the lazy state monad if you can help it.
> >
> >But a strict state monad would force everything to be loaded into memory
> >at once, right?
> >
> >What would you suggest I use instead?
> 
> I'm not sure - can you describe exactly what you want to do from a 
> higher level?  It might help to re-think the problem from the top down.

OK, I have one library which provides

inflate :: [Word8]   -- The input
-> ([Word8], -- A prefix of the input inflated (uncompressed)
[Word8]) -- The remainder of the input

I can't tell how much of the input I'll be inflating in advance, I only
know when I reach the end of the compressed part.

Inflating has an array and a few other bits of state while it
uncompresses the input.

(I'm assuming the inflation won't fail for now; later I might want
something like
inflate :: [Word8]   -- The input
-> ([Word8], -- A prefix of the input, inflated (uncompressed)
[Word8], -- The remainder of the input
Bool)-- Inflation failed
where you would need to write the inflated data to a file, say, before
checking the Bool to see if there was an error (if you want to work in
constant space)).

I'm happy to have a different type for inflate if necessary (e.g.
inflate :: m [Word8] -> ([Word8] -> m ()) -> m [Word8]
where inflate uses the Monad of the caller to read and write the
remaining input; this leads to something using a monad transformer
for inflates other state, along the lines of Test2 in my original
message, which lead to a stack overflow) but it shouldn't be wedded to
the following library:


I then have another library with a function that does:

while (some input left)
read header
call inflate
read footer
return (concat all the inflate results)

Reading headers is a fiddly enough task that passing the input around by
hand is undesirable.


Thanks
Ian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Space usage problems

2006-01-11 Thread Simon Marlow

Ian Lynagh wrote:

On Wed, Jan 11, 2006 at 10:36:47AM +, Simon Marlow wrote:


My suggestion: don't use the lazy state monad if you can help it.


But a strict state monad would force everything to be loaded into memory
at once, right?

What would you suggest I use instead?


I'm not sure - can you describe exactly what you want to do from a 
higher level?  It might help to re-think the problem from the top down.



Or do I just have to tread carefully to keep this optimisation happy
until the GCer is improved?


I can't see us fixing this in the short term, I'm afraid.

Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Space usage problems

2006-01-11 Thread Ian Lynagh
On Wed, Jan 11, 2006 at 10:36:47AM +, Simon Marlow wrote:
> 
> My suggestion: don't use the lazy state monad if you can help it.

But a strict state monad would force everything to be loaded into memory
at once, right?

What would you suggest I use instead?

Or do I just have to tread carefully to keep this optimisation happy
until the GCer is improved?


Thanks
Ian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Space usage problems

2006-01-11 Thread Simon Marlow

Ian Lynagh wrote:

On Tue, Jan 10, 2006 at 04:44:33PM +, Ian Lynagh wrote:


readChunks :: FirstMonad String
readChunks = do xs <- get
   if null xs then return []
  else do let (ys, zs) = foo xs
  put zs
  rest <- readChunks
  return (ys ++ rest)



It looks like changing this let to a case fixes this example, but at the
time I'd experimented with that there must have been other issues
clouding the effect, such as the following.

Foo1 (attached) uses large amounts of memory whereas Foo2 (also
attached) runs in a little constant space. The difference is only
changing this:

else do chunk <- case foo xs of
 (ys, zs) ->
 do put zs
return ys
chunks <- readChunks
return (chunk ++ chunks)

to this:

else case foo xs of
 (ys, zs) ->
 do put zs
chunks <- readChunks
return (ys ++ chunks)


I had great difficulty understanding this, but I think I do now.  It's a 
bit easier to understand if you inline the monads away.  Foo1 translates 
to this:


  bar [] = ([],[])
  bar (x:xs) = let (zs,ys) = bar xs in (x:zs,ys)

  readChunks [] = ([],[])
  readChunks xs = let (ys,zs) = bar xs
  (chunks,rest) = readChunks zs in
  (ys ++ chunks, rest)
and Foo2:

  readChunks [] = ([],[])
  readChunks xs = case bar xs of
   (zs,ys) -> let (chunks,rest) = readChunks ys in
  (zs ++ chunks, rest)


This is pretty much what GHC ends up with when you give -O (actually it 
turns some of the tuples into unboxed tuples, but that's not important).


We can see in Foo1 that chunks is a thunk holding on to zs, which is a 
thunk that holds on to xs, so you never get to release xs until the 
whole result list (ys) is traversed.  GHC's lazy tuple optimisation 
doesn't kick in, because neither chunks nor rest are evaluated.


However, it's not so clear why Foo2 is better.  chunks holds on to ys, 
the second of the pair returned by bar.  In fact, ys will point to a 
chain of thunks that looks like this:


  ys = snd (_, snd (_, snd (_, snd (_, snd ...

every time GC runs, it can completely eliminate this list via the 
well-known lazy tuple optimisation.  Unfortunately it doesn't 
*completely* eliminate the list, because of a shortcoming in our 
implementation, actually reported earlier by Ian Lynagh with a very 
similar program :-)  Fortunately in this example we do seem to be 
reducing enough of the list to eliminate the space leak, though.


My suggestion: don't use the lazy state monad if you can help it.

Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe