Re: [Haskell-cafe] attoparsec and vectors

2011-06-29 Thread Roman Leshchinskiy
Gregory Collins wrote:
 On Tue, Jun 28, 2011 at 6:20 PM, Eric Rasmussen ericrasmus...@gmail.com
 wrote:

 It runs quickly, but I naively thought I could outperform it by
 reworking many to build a vector directly, instead of having to build
 a list first and then convert it to a vector:

 manyVec :: Alternative f = f a - f (V.Vector a) manyVec v = many_v  
 where many_v = some_v | pure V.empty         some_v = V.cons
 $ v * many_v


 That's an O(n^2) loop, and a thunk leak to boot. If you don't know the
 size of the vector ahead of time, the only way I can think of to beat
 Vector.fromList is to use a mutable vector with a highwater mark,
 and double the size if you fill it. At the end, you'd use unsafeFreeze to
 turn the mutable vector into a pure one, and unsafeTake to truncate the
 vector into the correct size.

That's basically what fromList does. You could do this at a higher
abstraction level by generating a Stream rather than a list and then using
unstream to create a vector. I don't know if it's possible to do that with
attoparsec. But you'd only save allocating and deallocating a lazily
consumed list anyway. I'm not sure if it will be even noticable compared
to how much parsing costs.

 For an example of a similar technique (minus the freezing part), I did
 a similar thing in the hashtables library:

You might be interested in 'grow' :-)

http://hackage.haskell.org/packages/archive/vector/0.7.1/doc/html/Data-Vector-Generic-Mutable.html#g:8

Roman




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] attoparsec and vectors

2011-06-29 Thread Gregory Collins
On Wed, Jun 29, 2011 at 4:16 AM, Roman Leshchinskiy r...@cse.unsw.edu.au 
wrote:
 Gregory Collins wrote:
 For an example of a similar technique (minus the freezing part), I did
 a similar thing in the hashtables library:

 You might be interested in 'grow' :-)

I would be, except to save a couple of words I used a more primitive
array type than vector :)

G
-- 
Gregory Collins g...@gregorycollins.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] attoparsec and vectors

2011-06-28 Thread Eric Rasmussen
Hi everyone,

I have a task that involves parsing a flat-file into a vector from the
Data.Vector package. In the interest of keeping things simple, I first used
Attoparsec.Text's version of many and then converted the resulting list to
a vector:

import qualified Data.Vector as V
import Data.Attoparsec.Text as A
import Control.Applicative

getData = do results - A.many someParser
return (V.fromList results)

It runs quickly, but I naively thought I could outperform it by reworking
many to build a vector directly, instead of having to build a list first
and then convert it to a vector:

manyVec :: Alternative f = f a - f (V.Vector a)
manyVec v = many_v
  where many_v = some_v | pure V.empty
some_v = V.cons $ v * many_v

Unfortunately, manyVec either quickly leads to a stack space overflow, or it
takes an enormous amount of time to run. Does anyone know of a better way to
build up a vector from parsing results?

Thanks for any tips or insight,
Eric
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] attoparsec and vectors

2011-06-28 Thread Gregory Collins
On Tue, Jun 28, 2011 at 6:20 PM, Eric Rasmussen ericrasmus...@gmail.com wrote:
 It runs quickly, but I naively thought I could outperform it by reworking
 many to build a vector directly, instead of having to build a list first
 and then convert it to a vector:

 manyVec :: Alternative f = f a - f (V.Vector a)
 manyVec v = many_v
   where many_v = some_v | pure V.empty
     some_v = V.cons $ v * many_v

That's an O(n^2) loop, and a thunk leak to boot. If you don't know the
size of the vector ahead of time, the only way I can think of to beat
Vector.fromList is to use a mutable vector with a highwater mark,
and double the size if you fill it. At the end, you'd use
unsafeFreeze to turn the mutable vector into a pure one, and
unsafeTake to truncate the vector into the correct size.

For an example of a similar technique (minus the freezing part), I did
a similar thing in the hashtables library:

https://github.com/gregorycollins/hashtables/blob/master/src/Data/HashTable/Internal/Linear/Bucket.hs#L45

It's unlikely to be worth the extra effort except for extremely
performance-critical code.

G
-- 
Gregory Collins g...@gregorycollins.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe