Re: [Haskell-cafe] attoparsec and vectors
Gregory Collins wrote: On Tue, Jun 28, 2011 at 6:20 PM, Eric Rasmussen ericrasmus...@gmail.com wrote: It runs quickly, but I naively thought I could outperform it by reworking many to build a vector directly, instead of having to build a list first and then convert it to a vector: manyVec :: Alternative f = f a - f (V.Vector a) manyVec v = many_v  where many_v = some_v | pure V.empty        some_v = V.cons $ v * many_v That's an O(n^2) loop, and a thunk leak to boot. If you don't know the size of the vector ahead of time, the only way I can think of to beat Vector.fromList is to use a mutable vector with a highwater mark, and double the size if you fill it. At the end, you'd use unsafeFreeze to turn the mutable vector into a pure one, and unsafeTake to truncate the vector into the correct size. That's basically what fromList does. You could do this at a higher abstraction level by generating a Stream rather than a list and then using unstream to create a vector. I don't know if it's possible to do that with attoparsec. But you'd only save allocating and deallocating a lazily consumed list anyway. I'm not sure if it will be even noticable compared to how much parsing costs. For an example of a similar technique (minus the freezing part), I did a similar thing in the hashtables library: You might be interested in 'grow' :-) http://hackage.haskell.org/packages/archive/vector/0.7.1/doc/html/Data-Vector-Generic-Mutable.html#g:8 Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] attoparsec and vectors
On Wed, Jun 29, 2011 at 4:16 AM, Roman Leshchinskiy r...@cse.unsw.edu.au wrote: Gregory Collins wrote: For an example of a similar technique (minus the freezing part), I did a similar thing in the hashtables library: You might be interested in 'grow' :-) I would be, except to save a couple of words I used a more primitive array type than vector :) G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] attoparsec and vectors
Hi everyone, I have a task that involves parsing a flat-file into a vector from the Data.Vector package. In the interest of keeping things simple, I first used Attoparsec.Text's version of many and then converted the resulting list to a vector: import qualified Data.Vector as V import Data.Attoparsec.Text as A import Control.Applicative getData = do results - A.many someParser return (V.fromList results) It runs quickly, but I naively thought I could outperform it by reworking many to build a vector directly, instead of having to build a list first and then convert it to a vector: manyVec :: Alternative f = f a - f (V.Vector a) manyVec v = many_v where many_v = some_v | pure V.empty some_v = V.cons $ v * many_v Unfortunately, manyVec either quickly leads to a stack space overflow, or it takes an enormous amount of time to run. Does anyone know of a better way to build up a vector from parsing results? Thanks for any tips or insight, Eric ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] attoparsec and vectors
On Tue, Jun 28, 2011 at 6:20 PM, Eric Rasmussen ericrasmus...@gmail.com wrote: It runs quickly, but I naively thought I could outperform it by reworking many to build a vector directly, instead of having to build a list first and then convert it to a vector: manyVec :: Alternative f = f a - f (V.Vector a) manyVec v = many_v where many_v = some_v | pure V.empty some_v = V.cons $ v * many_v That's an O(n^2) loop, and a thunk leak to boot. If you don't know the size of the vector ahead of time, the only way I can think of to beat Vector.fromList is to use a mutable vector with a highwater mark, and double the size if you fill it. At the end, you'd use unsafeFreeze to turn the mutable vector into a pure one, and unsafeTake to truncate the vector into the correct size. For an example of a similar technique (minus the freezing part), I did a similar thing in the hashtables library: https://github.com/gregorycollins/hashtables/blob/master/src/Data/HashTable/Internal/Linear/Bucket.hs#L45 It's unlikely to be worth the extra effort except for extremely performance-critical code. G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe