Re: [Haskell-cafe] Lazy object deserialization

2013-03-15 Thread Ketil Malde

Scott Lawrence byt...@gmail.com writes:

 All the object serialization/deserialization libraries I could find (pretty 
 much just binary and cereal) seem to be strict with respect to the actual 
 data 
 being serialized. 

Binary became strict between 0.4.4 and 0.5, I think doing so improved
the speed of GHC.  I had a file format wrapper at the time which
depended on it being lazy, and was stuck at 0.4.4 for a long time, until
somebody helped me with a workaround.  Performance is a bit worse than
it was with 0.4.4, but I think it's better to stay with current code.

The code in question is in the darcs repository at:

  http://malde.org/~ketil/biohaskell/biosff

the relevant patch is named Update to use [...]

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy object deserialization

2013-03-13 Thread Jeff Shaw

On 3/13/2013 12:15 AM, Scott Lawrence wrote:

Hey all,

All the object serialization/deserialization libraries I could find 
(pretty much just binary and cereal) seem to be strict with respect to 
the actual data being serialized. In particular, if I've serialized a 
large [Int] to a file, and I want to get the first element, it seems I 
have no choice but to deserialize the entire data structure. This is 
obviously an issue for large data sets.


There are obvious workarounds (explicitly fetch elements from the 
database instead of relying on unsafeInterleaveIO to deal with it 
all magically), but it seems like it should be possible to build a 
cereal-like library that allows proper lazy deserialization. Does it 
exist, and I've just missed it?


Thanks,

I haven't tested this, but I suspect something like this could give you 
lazy binary serialization and deserialization. It's not tail recursive, 
though.


newtype LazyBinaryList a = LazyBinaryList [a]

instance Binary a = LazyBinaryList a where
put (LazyBinaryList []) = putWord8 0
put (LazyBinaryList (x:xs)) = putWord8 1  put x  put 
(LazyBinaryList xs)

get = do
t - getWord8
case t of
0 - return (LazyBinaryList [])
1 - do
x - get
(LazyBinaryList xs) - get
return $ LazyBinaryList (x:xs)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy object deserialization

2013-03-13 Thread Scott Lawrence
I tried it, but it still goes and reads the whole list. Looking at the 
`binary` package source code it seems that strict evaluation is hard-coded in 
a few places, presumably for performance reasons. It also seems to necessarily 
read the bytestring sequentially, so complex tree-like data structures would 
presumably encounter problems even if it worked for a list.


Ah well. As long as I'm not duplicating someone else's work, I'm more than 
happy to go at this from scratch.


On Wed, 13 Mar 2013, Jeff Shaw wrote:


On 3/13/2013 12:15 AM, Scott Lawrence wrote:

Hey all,

All the object serialization/deserialization libraries I could find (pretty 
much just binary and cereal) seem to be strict with respect to the actual 
data being serialized. In particular, if I've serialized a large [Int] to a 
file, and I want to get the first element, it seems I have no choice but to 
deserialize the entire data structure. This is obviously an issue for large 
data sets.


There are obvious workarounds (explicitly fetch elements from the 
database instead of relying on unsafeInterleaveIO to deal with it all 
magically), but it seems like it should be possible to build a cereal-like 
library that allows proper lazy deserialization. Does it exist, and I've 
just missed it?


Thanks,

I haven't tested this, but I suspect something like this could give you lazy 
binary serialization and deserialization. It's not tail recursive, though.


newtype LazyBinaryList a = LazyBinaryList [a]

instance Binary a = LazyBinaryList a where
   put (LazyBinaryList []) = putWord8 0
   put (LazyBinaryList (x:xs)) = putWord8 1  put x  put (LazyBinaryList 
xs)

   get = do
   t - getWord8
   case t of
   0 - return (LazyBinaryList [])
   1 - do
   x - get
   (LazyBinaryList xs) - get
   return $ LazyBinaryList (x:xs)



--
Scott Lawrence

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy object deserialization

2013-03-13 Thread Ben
that's too bad, i used lazy deserialization for an external sort thing i did 
aeons ago.

http://www.haskell.org/pipermail/haskell-cafe/2007-July/029156.html

that was an amusing exercise in lazy IO.  these days it's probably better off 
doing something with pipes et al instead of unsafeInterleaveIO.

b

On Mar 13, 2013, at 2:54 PM, Scott Lawrence wrote:

 I tried it, but it still goes and reads the whole list. Looking at the 
 `binary` package source code it seems that strict evaluation is hard-coded in 
 a few places, presumably for performance reasons. It also seems to 
 necessarily read the bytestring sequentially, so complex tree-like data 
 structures would presumably encounter problems even if it worked for a list.
 
 Ah well. As long as I'm not duplicating someone else's work, I'm more than 
 happy to go at this from scratch.
 
 On Wed, 13 Mar 2013, Jeff Shaw wrote:
 
 On 3/13/2013 12:15 AM, Scott Lawrence wrote:
 Hey all,
 All the object serialization/deserialization libraries I could find (pretty 
 much just binary and cereal) seem to be strict with respect to the actual 
 data being serialized. In particular, if I've serialized a large [Int] to a 
 file, and I want to get the first element, it seems I have no choice but to 
 deserialize the entire data structure. This is obviously an issue for large 
 data sets.
 There are obvious workarounds (explicitly fetch elements from the 
 database instead of relying on unsafeInterleaveIO to deal with it all 
 magically), but it seems like it should be possible to build a cereal-like 
 library that allows proper lazy deserialization. Does it exist, and I've 
 just missed it?
 Thanks,
 I haven't tested this, but I suspect something like this could give you lazy 
 binary serialization and deserialization. It's not tail recursive, though.
 
 newtype LazyBinaryList a = LazyBinaryList [a]
 
 instance Binary a = LazyBinaryList a where
   put (LazyBinaryList []) = putWord8 0
   put (LazyBinaryList (x:xs)) = putWord8 1  put x  put (LazyBinaryList 
 xs)
   get = do
   t - getWord8
   case t of
   0 - return (LazyBinaryList [])
   1 - do
   x - get
   (LazyBinaryList xs) - get
   return $ LazyBinaryList (x:xs)
 
 
 -- 
 Scott Lawrence
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy object deserialization

2013-03-13 Thread Jeff Shaw

On 3/13/2013 6:14 PM, Ben wrote:

that's too bad, i used lazy deserialization for an external sort thing i did 
aeons ago.

http://www.haskell.org/pipermail/haskell-cafe/2007-July/029156.html

that was an amusing exercise in lazy IO.  these days it's probably better off 
doing something with pipes et al instead of unsafeInterleaveIO.

b

On Mar 13, 2013, at 2:54 PM, Scott Lawrence wrote:


I tried it, but it still goes and reads the whole list. Looking at the `binary` 
package source code it seems that strict evaluation is hard-coded in a few 
places, presumably for performance reasons. It also seems to necessarily read 
the bytestring sequentially, so complex tree-like data structures would 
presumably encounter problems even if it worked for a list.

Ah well. As long as I'm not duplicating someone else's work, I'm more than 
happy to go at this from scratch.

On Wed, 13 Mar 2013, Jeff Shaw wrote:


On 3/13/2013 12:15 AM, Scott Lawrence wrote:

Hey all,
All the object serialization/deserialization libraries I could find (pretty 
much just binary and cereal) seem to be strict with respect to the actual data 
being serialized. In particular, if I've serialized a large [Int] to a file, 
and I want to get the first element, it seems I have no choice but to 
deserialize the entire data structure. This is obviously an issue for large 
data sets.
There are obvious workarounds (explicitly fetch elements from the database 
instead of relying on unsafeInterleaveIO to deal with it all magically), but it seems 
like it should be possible to build a cereal-like library that allows proper lazy 
deserialization. Does it exist, and I've just missed it?
Thanks,

I haven't tested this, but I suspect something like this could give you lazy 
binary serialization and deserialization. It's not tail recursive, though.

newtype LazyBinaryList a = LazyBinaryList [a]

instance Binary a = LazyBinaryList a where
   put (LazyBinaryList []) = putWord8 0
   put (LazyBinaryList (x:xs)) = putWord8 1  put x  put (LazyBinaryList xs)
   get = do
   t - getWord8
   case t of
   0 - return (LazyBinaryList [])
   1 - do
   x - get
   (LazyBinaryList xs) - get
   return $ LazyBinaryList (x:xs)


--
Scott Lawrence

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
I tried really hard to make Binary read a list in lazily, but I failed. 
Here's a full program that demonstrates the idea I had earlier. I used 
binary-0.7.0.1*.*


import Data.Binary
import Data.Binary.Get
import System.Environment
import qualified Data.ByteString.Lazy as L

newtype LazyBinaryList a = LazyBinaryList [a]
deriving (Show)

instance Binary a = Binary (LazyBinaryList a) where
put = putLazy
get = getLazy

putLazy :: Binary a = LazyBinaryList a - Put
putLazy (LazyBinaryList []) = putWord8 0
putLazy (LazyBinaryList (x:xs)) = putWord8 1  put x  putLazy 
(LazyBinaryList xs)

getLazy :: Binary a = Get (LazyBinaryList a)
getLazy = aux []
where aux accum = do
t - getWord8
case t of
0 - return (LazyBinaryList (reverse accum))
1 - do
x - get
aux (x:accum)

list :: LazyBinaryList Int
list = LazyBinaryList [1..1]

main = do
args - getArgs
case args !! 0 of
encode - encodeFile (args !! 1) list
decode - do
bs - L.readFile $ args !! 1
let lOrR = decodeOrFail bs :: Either (L.ByteString, 
ByteOffset, String) (L.ByteString, ByteOffset, LazyBinaryList Int)
case lOrR of
Left l - print l
Right (_,offset,LazyBinaryList l) - putStrLn $ at offset 
 ++ show offset ++  read value  ++ show (head l)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Lazy object deserialization

2013-03-12 Thread Scott Lawrence

Hey all,

All the object serialization/deserialization libraries I could find (pretty 
much just binary and cereal) seem to be strict with respect to the actual data 
being serialized. In particular, if I've serialized a large [Int] to a file, 
and I want to get the first element, it seems I have no choice but to 
deserialize the entire data structure. This is obviously an issue for large 
data sets.


There are obvious workarounds (explicitly fetch elements from the database 
instead of relying on unsafeInterleaveIO to deal with it all magically), but 
it seems like it should be possible to build a cereal-like library that allows 
proper lazy deserialization. Does it exist, and I've just missed it?


Thanks,

--
Scott Lawrence

Linux baidar 3.7.10-1-ARCH #1 SMP PREEMPT Thu Feb 28 09:50:17 CET 2013 x86_64 
GNU/Linux

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe