Re: [Haskell-cafe] Design suggestion for Data.Binary.Defer

2008-06-19 Thread Neil Mitchell
Hi

 Actually, you ought to be able to pretty easily remove this tradeoff
 by introducing a strict read function as a new method in your class.
 So anyone who wants to strictly read lazy data can use that function
 instead of the lazy one.

Not quite, the library is written so that strict fields are at the
current file pointer while deferred fields require a position jump.

For example, two strict strings would be:

helloworld

But two lazy strings would be:

[ptr 5][ptr 10]helloworld

   :( Lazy reading seems to
   require strict writing, while lazy writing requires strict reading!

The library is all designed around making reading maximally efficient,
and minimizing file seeks, but writing can be totally inefficient. As
you mention, the lazy/strict IO trade off is very interesting, its
just fortunate that for my application (Hoogle) the files are written
once and read many many times.

 I'm wondering if it would be an option to read the file backward. If so
  length annotations could be put at the end.

  Reading  backward  seems  to  only fail with streamed data, which won't 
 support
  the seeking required by a lazy reading anyway.

You still want to minimize seeking, even if it is possible. In my case
having length information at the front helps reading, and I don't care
about writing, so its an easy win.

I think I've got my design nailed down, and have most of it
implemented and used in Hoogle. These discussions and points were very
helpful!

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design suggestion for Data.Binary.Defer

2008-06-17 Thread David Roundy
On Tue, Jun 17, 2008 at 07:55:51AM +0100, Neil Mitchell wrote:
 Hi,

Hello!

 David:
  Is there any reason not to just write all lazy fields of variable size
   in a deferred manner?
 
 I completely hadn't though of this! You will loose a bit of time, for
 example reading fields which were previously not deferred will require
 a file seek, but the effort/thought/efficiency trade-off is good. I
 think I will go with this.

Actually, you ought to be able to pretty easily remove this tradeoff
by introducing a strict read function as a new method in your class.
So anyone who wants to strictly read lazy data can use that function
instead of the lazy one.  The writing of data is unaffected... or
rather, it *is* affected, but only in the sense that writing deferred
data cannot be as lazy as the writing of undeferred data.  e.g. when
writing a deferred list, you can't output the first word until you've
already read through the entire list in order to find out how long it
is.  That could actually be quite a problem for code that relies on
laziness to handle massive amounts of data.  :( Lazy reading seems to
require strict writing, while lazy writing requires strict reading!

So perhaps you want a pair of read functions, one strict, one lazy,
and a pair of write functions.  That doesn't provide any sort of
fine-grained control, but at least allows streaming in either
direction.  You could use the same data format for either strictly or
lazily pickled data as long as you have a reserved value for the
length-to-be-skipped, so the lazy reader could tell if it reaches data
that whose length it doesn't know.

David
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design suggestion for Data.Binary.Defer

2008-06-17 Thread Nicolas Pouillard
Excerpts from David Roundy's message of Tue Jun 17 20:27:01 +0200 2008:
 On Tue, Jun 17, 2008 at 07:55:51AM +0100, Neil Mitchell wrote:
  Hi,
 
 Hello!

Hello,

 :( Lazy reading seems to
 require strict writing, while lazy writing requires strict reading!

I'm wondering if it would be an option to read the file backward. If so
length annotations could be put at the end.

Reading  backward  seems  to  only fail with streamed data, which won't support
the seeking required by a lazy reading anyway.

Regards,

-- 
Nicolas Pouillard aka Ertai


signature.asc
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Design suggestion for Data.Binary.Defer

2008-06-16 Thread Neil Mitchell
Hi,

I'm in the process of updating the Deferred Binary library,
http://www-users.cs.york.ac.uk/~ndm/binarydefer/. The idea is that its
does serialisation, but certain elements can be marked as deferred -
instead of being written in the current file stream, they are merely
pointed at and if needed, that pointer will be followed.

Example:(hello,world), where the first field is marked as deferred
would write out:

[6]worldhello

i.e. [skip 6 characters if you want hello], the world data, the
hello data we previously promised to put here. When reading, the
hello would only be seeked to and read if necessary.

So, its like binary, but some fields are lazy. The question is how to
indicate which fields are lazy. There are three schemes I can think
of:

== Simple Instances ==

put (a,b) = putDefer a  put b
get = do a - getDefer; b - get; return (a,b)

If the put/get and putDefer/getDefer items do not line up perfectly it
will go very wrong at runtime - probably resulting in random values
being created. You also can't derive the instances automatically, with
something like Derive or DrIFT.

== Complex Instances ==

This is the current scheme, based on lazy pattern matching and
exceptions - very confusing, probably low performance.

deferBoth = [\~(a,b) - unit (,) ~ a  b]

Where ~ a means write out the a field lazily, and  b means write
out the b field strictly. The advantage over the simple instances is
that a field being deferred is declared once.

== Lazy Data Type ==

Instead of customizing the instance, you can write a data Defer a =
Defer a type, and then instead of the original tuple write:

(Defer hello,world)

But now the code must unwrap the Defer before accessing hello, but
the instance becomes much simpler, and can be derived.

== The Question ==

Is there a simple way of tagging fields in a constructor as deferred,
just once for reading and writing, and ideally outside the instance
definition and not requiring additional code to unwrap? I can't think
of any, but there may be something I have missed.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design suggestion for Data.Binary.Defer

2008-06-16 Thread Derek Elkins
On Mon, 2008-06-16 at 17:43 +0100, Neil Mitchell wrote:
 Hi,
 
 I'm in the process of updating the Deferred Binary library,
 http://www-users.cs.york.ac.uk/~ndm/binarydefer/. The idea is that its
 does serialisation, but certain elements can be marked as deferred -
 instead of being written in the current file stream, they are merely
 pointed at and if needed, that pointer will be followed.
 
 Example:(hello,world), where the first field is marked as deferred
 would write out:
 
 [6]worldhello
 
 i.e. [skip 6 characters if you want hello], the world data, the
 hello data we previously promised to put here. When reading, the
 hello would only be seeked to and read if necessary.
 
 So, its like binary, but some fields are lazy. The question is how to
 indicate which fields are lazy. There are three schemes I can think
 of:
 
 == Simple Instances ==
 
 put (a,b) = putDefer a  put b
 get = do a - getDefer; b - get; return (a,b)
 
 If the put/get and putDefer/getDefer items do not line up perfectly it
 will go very wrong at runtime - probably resulting in random values
 being created. You also can't derive the instances automatically, with
 something like Derive or DrIFT.
 
 == Complex Instances ==
 
 This is the current scheme, based on lazy pattern matching and
 exceptions - very confusing, probably low performance.
 
 deferBoth = [\~(a,b) - unit (,) ~ a  b]
 
 Where ~ a means write out the a field lazily, and  b means write
 out the b field strictly. The advantage over the simple instances is
 that a field being deferred is declared once.
 
 == Lazy Data Type ==
 
 Instead of customizing the instance, you can write a data Defer a =
 Defer a type, and then instead of the original tuple write:
 
 (Defer hello,world)
 
 But now the code must unwrap the Defer before accessing hello, but
 the instance becomes much simpler, and can be derived.
 
 == The Question ==
 
 Is there a simple way of tagging fields in a constructor as deferred,
 just once for reading and writing, and ideally outside the instance
 definition and not requiring additional code to unwrap? I can't think
 of any, but there may be something I have missed.

This isn't an answer to your question. This is just my opinion based on
my values.

I'd just immediately do option three.  It seems the simplest, most
composable and most flexible, and I like having things reflected in my
types.  It seems easy enough to code up the first option given the last.
I don't really understand what you are getting at with the second
option, but I suspect it too is easy to do on top of the third option.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Design suggestion for Data.Binary.Defer

2008-06-16 Thread David Roundy
On Mon, Jun 16, 2008 at 9:43 AM, Neil Mitchell [EMAIL PROTECTED] wrote:
 == The Question ==

 Is there a simple way of tagging fields in a constructor as deferred,
 just once for reading and writing, and ideally outside the instance
 definition and not requiring additional code to unwrap? I can't think
 of any, but there may be something I have missed.

Is there any reason not to just write all lazy fields of variable size
in a deferred manner? That would seem like the best option to me, but
maybe that's just because I don't see any downside besides the
requirement that one use some space to write the size.  But that seems
like it would almost always be a trivial issue, as any small data
element will have a fixed size (so that the deferred/non-deferred
distinction doesn't exist).

David
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe