Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-20 Thread Evan Laforge
 I think my short term solution is going to be remove -auto-all from
 attoparsec's cabal---I'm not profiling attoparsec and so I don't want
 my entire profile output to be internal attoparsec functions.  But
 presumably the flag was added there for a reason, so maybe there are
 people who really want that.

 Yes - me :-)

Yes, certainly it makes sense for someone profiling the library
itself, or if I thought your library was too slow and wanted to send a
profile in a bug report.  But I don't think it makes so much sense for
plain users, especially when it's applied only to some libraries.

However, this seems to be a fairly common practice... at least parsec3
and text both do this, which is why I initially thought parsec3 was so
much slower than parsec2, and attoparsec-text was doubly slow.

 I might, though, see if there's a way I could enable that flag only for
 myself (in a way that I wouldn't routinely forget).

This sounds like a good idea to me.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-15 Thread Neil Mitchell
 This isn't completely without basis. For instance, I made some big speed
 improvements to attoparsec's very performance-sensitive takeWhile function
 just the other day, thanks to -auto-all.

 I might, though, see if there's a way I could enable that flag only for
 myself (in a way that I wouldn't routinely forget).

See ~/.cabal/config - I use that to make sure all my packages are
installed globally with profiling, and I think it might have enough
options to force -auto-all in some way.

Thanks, Neil

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-14 Thread Felipe Almeida Lessa
On Thu, Jan 13, 2011 at 12:15 AM, Evan Laforge qdun...@gmail.com wrote:
 Well, I tried it... and it's still slower!

 parsec2, String: (a little faster since last time since I have new computer)
        total time  =        9.10 secs   (455 ticks @ 20 ms)
        total alloc = 2,295,837,512 bytes  (excludes profiling overheads)

 attoparsec-text, Data.Text:
        total time  =       14.72 secs   (736 ticks @ 20 ms)
        total alloc = 2,797,672,844 bytes  (excludes profiling overheads)

Interesting.

 Just in case there's some useful criticism, here's one of the busier parsers:

 p_unsigned_float :: A.Parser Double
 p_unsigned_float = do
    i - A.takeWhile Char.isDigit
    f - A.option  (A.char '.'  A.takeWhile1 Char.isDigit)
    if (Text.null i  Text.null f) then mzero else do
    case (dec i, dec f) of
        (Just i', Just f') - return $ fromIntegral i'
            + fromIntegral f' / fromIntegral (10 ^ (Text.length f))
        _ - mzero
    where
    dec :: Text.Text - Maybe Int
    dec s
        | Text.null s = Just 0
        | otherwise = case Text.Read.decimal s of
            Right (d, rest) | Text.null rest - Just d
            _ - Nothing


I've tried creating a benchmark using this code.  It's on the recently
created attoparsec-text darcs repo [1,2].  There is a 2.7 MiB test
file with many numbers to be parsed.  The attoparsec-text package was
installed using -O (Cabal's default) and the test program was compiled
with ghc -hide-package parsec-3.1.0 --make -O2.

Using parsers that return the parsed number as a double and then sum
everything up, I get the following timings:

attoparsec_text_builtin
   2,241,038,864 bytes allocated in the heap
  46 MB total memory in use (1 MB lost due to fragmentation)
  MUT   time1.10s  (  1.13s elapsed)
  GCtime0.15s  (  0.20s elapsed)
  Total time1.25s  (  1.32s elapsed)

attoparsec_text_laforge
   1,281,603,768 bytes allocated in the heap
 101 MB total memory in use (2 MB lost due to fragmentation)
  MUT   time0.58s  (  0.62s elapsed)
  GCtime0.47s  (  0.54s elapsed)
  Total time1.05s  (  1.16s elapsed)

parsec_laforge
   1,558,621,208 bytes allocated in the heap
  47 MB total memory in use (0 MB lost due to fragmentation)
  MUT   time0.82s  (  0.84s elapsed)
  GCtime0.46s  (  0.51s elapsed)
  Total time1.27s  (  1.35s elapsed)

'attoparsec_text_builtin' uses Data.Attoparsec.Text.double available
on the darcs version of the library.  It tries to handle more cases,
like exponents, and thus it is expected to be slower than your
version.  'attoparsec_text_laforge' and 'parsec_laforge' are very
similar to the one you gave in your e-mail, but with some
modifications (e.g. Text.Read.decimal can't be used with Strings).
Using attoparsec-text is faster and allocates less, but for some
reason the faster version takes up a lot more memory.

As the total memory figures were strange, I created a different
version that parses the input but does not create any Doubles.
Instead of summing them, the number of Doubles (if they were parsed)
is counted.  These are the results:

attoparsec_text_laforge_discarding
 985,843,696 bytes allocated in the heap
  25 MB total memory in use (0 MB lost due to fragmentation)
  MUT   time0.38s  (  0.39s elapsed)
  GCtime0.07s  (  0.10s elapsed)
  Total time0.45s  (  0.49s elapsed)

parsec_laforge_discarding double_test.txt +RTS -s
   1,471,829,664 bytes allocated in the heap
  28 MB total memory in use (0 MB lost due to fragmentation)
  MUT   time0.66s  (  0.68s elapsed)
  GCtime0.44s  (  0.46s elapsed)
  Total time1.10s  (  1.14s elapsed)

Now attoparsec-text is more than twice faster, allocates even less
memory and the total memory figures seem right.

Bottom line: I think this benchmark doesn't really represent the kind
of workload your parser has.  Can you reproduce these results on your
system?

Cheers! =)

[1] http://patch-tag.com/r/felipe/attoparsec-text/
[2] 
http://patch-tag.com/r/felipe/attoparsec-text/snapshot/current/content/pretty/benchmarks/Double.hs

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-14 Thread Evan Laforge
 Now attoparsec-text is more than twice faster, allocates even less
 memory and the total memory figures seem right.

 Bottom line: I think this benchmark doesn't really represent the kind
 of workload your parser has.  Can you reproduce these results on your
 system?

I spent quite a bit of time trying to reduce this down to a minimal
reproduction and getting confusing results.  Then I found out that
compiling with profiling enabled makes attoparsec slow and parsec
fast.  When I compile without any profiling, here's what I get, in CPU
time:

parsec run 100 - time: 1.22s -
atto bs run 100 - time: 0.38s -
atto text run 100 - time: 0.78s -

This looks more like I expect it to.  I don't understand the parsec
thing... one of the first things I did was recompile and reinstall
parsec2, making sure to pass -p to configure, and verify that there is
a /usr/local/lib/parsec-2.1.0.1/ghc-6.12.3/libHSparsec-2.1.0.1_p.a.
However, on closer inspection, I believe I've found the culprit.
Compiling with 'build -v' for attoparsec reveals a ghc cmdline line:
'-prof -hisuf p_hi -osuf p_o -auto-all'.  Compiling parsec has: '-prof
-hisuf p_hi -osuf p_o'.  And indeed, attoparsec cabal has
'ghc-prof-options: -auto-all', which parsec's cabal does not.  And in
fact, parsec3 also has this -auto-all, which both explains why the
profile is full of internal functions and why parsec3 was so much
slower than parsec2.

I'm glad to have finally tracked this down, but unhappy that I spent
so much time on it.  It seems like a trap waiting to be sprung if
various libraries are compiled with their individually specified
flags, which have major effects on performance.  Maybe I should have
noticed, but it seems pretty subtle to me.  GHC will refuse to compile
non-profiling libs against a profiling build, but doesn't go down to
the level of flags.

I think my short term solution is going to be remove -auto-all from
attoparsec's cabal---I'm not profiling attoparsec and so I don't want
my entire profile output to be internal attoparsec functions.  But
presumably the flag was added there for a reason, so maybe there are
people who really want that.  Is there a better solution?  GHC warns
when linking a profiling lib compiled with different profiling flags?
A separate .p_auto-all_o suffix?  Removal of ghc-prof-options from
cabal?  A consensus to standardize on a set of flags?


BTW, yes my situation is a little different from your test.  It's lots
and lots of little expressions for a simple language in an in-memory
structure that get parsed individually.  So I don't care about file
reading speed, but I do care about parser startup overhead, since it's
lots and lots of little parses.  The numbers above are how long it
takes to parse 2.34 1m times.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-14 Thread Bryan O'Sullivan
On Fri, Jan 14, 2011 at 5:54 PM, Evan Laforge qdun...@gmail.com wrote:



Then I found out that
 compiling with profiling enabled makes attoparsec slow and parsec
 fast.


Yes, the SCC annotations added by GHC have a fairly high cost.

I think my short term solution is going to be remove -auto-all from
 attoparsec's cabal---I'm not profiling attoparsec and so I don't want
 my entire profile output to be internal attoparsec functions.  But
 presumably the flag was added there for a reason, so maybe there are
 people who really want that.


Yes - me :-)

I typically turn on profiling for most of my libraries while I think of them
as under development, a period of indefinite length that comes to an end
when I deem the performance good enough. None of my libraries has actually
hit that point yet :-)

This isn't completely without basis. For instance, I made some big speed
improvements to attoparsec's very performance-sensitive takeWhile function
just the other day, thanks to -auto-all.

I might, though, see if there's a way I could enable that flag only for
myself (in a way that I wouldn't routinely forget).
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-13 Thread Maciej Piechotka
On Wed, 2011-01-12 at 18:15 -0800, Evan Laforge wrote:
 On Mon, Dec 27, 2010 at 6:51 AM, Evan Laforge qdun...@gmail.com
 wrote:
  I've uploaded attoparsec-text and attoparsec-text-enumerator to
  Hackage.  I've written those packages late last week and asked for
 
  Very nice!  I'll download this and try it out.  Attoparsec has a bit
  different combinators than parsec so it'll take some rewriting, but
  it's work I'd have to do anyway to try the bytestring+attoparsec
  approach.
 
 Well, I tried it... and it's still slower!
 
 parsec2, String: (a little faster since last time since I have new
 computer)
 total time  =9.10 secs   (455 ticks @ 20 ms)
 total alloc = 2,295,837,512 bytes  (excludes profiling
 overheads)
 
 attoparsec-text, Data.Text:
 total time  =   14.72 secs   (736 ticks @ 20 ms)
 total alloc = 2,797,672,844 bytes  (excludes profiling
 overheads) 

Sorry for asking but just for reference - what is performance of
nanoparsec on your machine in this test?

Regards


signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-13 Thread Permjacov Evgeniy
On 12/23/2010 06:01 AM, Evan Laforge wrote:

 This is not very encouraging!  Especially strange is how Text
 generates *more* allocation... I'd expect less since it doesn't unpack
 all the Texts.  
Errgh. To check against predicate, library HAS to unpack checked
character. There is no way around it.
 There's an obvious problem where I get the digits as a String and then
 parse that with list functions, but I can't see any way to get parsec
 to return a chunk of Text.  This is roughly how parsec itself parses
 numbers, in Text.Parsec.Token.

 Any ideas or experience?

If you wish performance so desperatley, you can try hand-coded parsing.
What I mean is, that if library has to unpack characters to check them
against isDigit predicate, why not to use it in building numeral value
immidiatley? This will eliminate intermidiate list.

However, every back-tracking parser is slow by definition. If you wish
maximum possible speed, consider hand-written lexer (this is not too
hard) and possibly Happy to generate parser.

BTW, how much utf16 text is around? I never found any in wild web.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-13 Thread Albert Y. C. Lai

On 11-01-13 02:07 PM, Permjacov Evgeniy wrote:


BTW, how much utf16 text is around? I never found any in wild web.


There is a lot of utf16 text in memory chips when you use Windows or 
Java. In the case of Windows there is also a lot on disk platters as 
file names. A scanning electron microscope may reveal them.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2011-01-12 Thread Evan Laforge
On Mon, Dec 27, 2010 at 6:51 AM, Evan Laforge qdun...@gmail.com wrote:
 I've uploaded attoparsec-text and attoparsec-text-enumerator to
 Hackage.  I've written those packages late last week and asked for

 Very nice!  I'll download this and try it out.  Attoparsec has a bit
 different combinators than parsec so it'll take some rewriting, but
 it's work I'd have to do anyway to try the bytestring+attoparsec
 approach.

Well, I tried it... and it's still slower!

parsec2, String: (a little faster since last time since I have new computer)
total time  =9.10 secs   (455 ticks @ 20 ms)
total alloc = 2,295,837,512 bytes  (excludes profiling overheads)

attoparsec-text, Data.Text:
total time  =   14.72 secs   (736 ticks @ 20 ms)
total alloc = 2,797,672,844 bytes  (excludes profiling overheads)

Top consumer in the profile is now
Data.Attoparsec.Text.Internal.runParser, followed, several entries
later, by bindP, addS, and mysteriously ?.  Suspicious that parsec
was compiled without profiling and hence not incurring profiling
overhead since parsec never appears in the profile, I tried running
without any profiling flags, but the numbers come about about the
same, I guess the prof output has already subtracted profiling
overhead.

The attoparsec profile output is hard to interpret, it's a huge tree
of internal attoparsec functions that are individually cheap but all
add up under runParser.  runParser itself is simple a newtype accessor
so I don't really understand why it's credited with so much time.  But
there are no clear culprits... my parsers make much use of takeWhile
and skipWhile and combinators like | and 'many' only occur at the
level of complete terms, and are thus called much more rarely.

The greater allocation is pretty mysterious.  I wasn't able to track
it down via heap allocation, the biggest allocator by module that is a
parsing module isn't much of an allocator, it peaks at around 350k.
Intuition says it should be much less because of using packed Text,
but I suppose even the takeWhile combinators have to unpack every
character into a Char, so maybe it's even less efficient because at
least String can directly reuse the Chars?

Actually, I've thought about this problem with haskell libraries
before: I have a packed array which I then do a bsearch over.  The
bsearch generates lots of garbage.  I was originally confused but my
current guess is that every comparison winds up unpacking the array
element, wrapping it in the haskell data type, and then extracting the
(boxed) Int from that.  An efficient implementation would compare the
int in place... perhaps it must inline the comparison and use a 'peek'
specialized to just extract the desired int, and then hope that the
optimizer figures out how to pass it unboxed.

I'll try a few optimizations I can think of.  If those fail, I'll try
with ByteString, maybe it's a problem with attoparsec-text.  If that
fails, I'll give up for real and go back to Parsec 2, still the leader
in speed.


Just in case there's some useful criticism, here's one of the busier parsers:

p_unsigned_float :: A.Parser Double
p_unsigned_float = do
i - A.takeWhile Char.isDigit
f - A.option  (A.char '.'  A.takeWhile1 Char.isDigit)
if (Text.null i  Text.null f) then mzero else do
case (dec i, dec f) of
(Just i', Just f') - return $ fromIntegral i'
+ fromIntegral f' / fromIntegral (10 ^ (Text.length f))
_ - mzero
where
dec :: Text.Text - Maybe Int
dec s
| Text.null s = Just 0
| otherwise = case Text.Read.decimal s of
Right (d, rest) | Text.null rest - Just d
_ - Nothing

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-27 Thread Evan Laforge
On Thu, Dec 23, 2010 at 12:01 PM, Evan Laforge qdun...@gmail.com wrote:
 Yeah, I know this has been discussed a number of times, but I have
 some concrete questions I haven't seen asked before.  And the parsec
 3 is now as fast as parsec 2 thing I've seen around doesn't seem to
 be true for me.

[ snip responses ]

So it sounds like the consensus is to bite the bullet and try
converting to ByteString + attoparsec and see if that helps.  Or write
attoparsec-text myself, or wait for someone else to do it :)

I might not get around to this real soon, but I'll post my results
when (if) I do.

Thanks for the responses!

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-27 Thread Felipe Almeida Lessa
Hello!

On Mon, Dec 27, 2010 at 9:41 AM, Evan Laforge qdun...@gmail.com wrote:
 So it sounds like the consensus is to bite the bullet and try
 converting to ByteString + attoparsec and see if that helps.  Or write
 attoparsec-text myself, or wait for someone else to do it :)

 I might not get around to this real soon, but I'll post my results
 when (if) I do.

I've uploaded attoparsec-text and attoparsec-text-enumerator to
Hackage.  I've written those packages late last week and asked for
comments from attoparsec and attoparsec-enumerator's maintainers.
Although both packages weren't stress tested (actually they were very
lightly tested), I'm releasing them so that we don't waste efforts
duplicating work =).

I'll make an announcement later today if no critical bugs are found.
Please test it and try to break it =).  Bonus points if we give us
some numbers about how it compares to your Parsec 2/3 approach.

Links:
http://hackage.haskell.org/package/attoparsec-text
http://hackage.haskell.org/package/attoparsec-text-enumerator

Cheers! =D

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-27 Thread Evan Laforge
 I've uploaded attoparsec-text and attoparsec-text-enumerator to
 Hackage.  I've written those packages late last week and asked for

Very nice!  I'll download this and try it out.  Attoparsec has a bit
different combinators than parsec so it'll take some rewriting, but
it's work I'd have to do anyway to try the bytestring+attoparsec
approach.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-24 Thread Maciej Piechotka
On Thu, 2010-12-23 at 18:38 +0200, Michael Snoyman wrote:
 On Thu, Dec 23, 2010 at 6:21 PM, Johan Tibell johan.tib...@gmail.com wrote:
  On Thu, Dec 23, 2010 at 3:03 PM, Felipe Almeida Lessa
  felipe.le...@gmail.com wrote:
  Michael Snoyman wants attoparsec-text as well [1].
 
  [1] http://docs.yesodweb.com/blog/wishlist/
 
  It's on my Christmas wishlist too.
 
  Johan
 
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 
 Since I'm sure everyone is thinking it at this point, I'll just say
 it: we're all hoping Bryan O'Sullivan saves the day again and writes
 this package. He wrote both attoparsec *and* text, so if he writes
 attoparsec-text, it will just be double the awesomeness. So Bryan,
 please do tell: how many beers (or any other consumable) will it take
 to get you to write it? I'll start up the collection fund, and throw
 in a six pack ;).
 
 Michael

I may be wrong but the attoparsec/attoparsec-text would be operating on
the same principles. Maybe using typeclass like Data.ListLike would be
solution?

I'd not quite sure how much would it slow down but it should be
possible.

More as proof of concept reimplementation of string parser (for real
life probably needs INLINE and SPECIALISE):

 import Control.Applicative
 import Control.Monad
 import Data.Monoid
 import Data.ListLike as LL
 
 data Result i r
 = Fail !i [String] String
 | Partial (i - Result i r)
 | Done !i r
 
 newtype Parser i a
 = Parser { runParser :: forall r. S i
  - Failure i   r
  - Success i a r
  - Result  i   r }
 
 type Failure i   r = S i - [String] - String - Result i r
 type Success i a r = S i - a - Result i r
 
 data More = Complete | Incomplete deriving (Eq, Show)
 
 instance Monoid More where
 mempty  = Incomplete
 mappend Complete _= Complete
 mappend _Complete = Complete
 mappend __= Incomplete
 
 data S i = S { input :: !i, _added :: !i, more :: !More }
 
 instance Functor (Parser i) where
 fmap p m = Parser (\st0 f k - runParser m st0 f (\s a - k s (p
a)))
 
 instance Applicative (Parser i) where
 pure x = Parser (\st0 _ ks - ks st0 x)
 (*) = ap
 
 instance Monad (Parser i) where
 return = pure
 m = g
 = Parser (\st0 kf ks - runParser m st0 kf (\s a - runParser
(g a) s kf ks))
 fail err = Parser (\st0 kf _ - kf st0 [] err)
 
 string :: (Eq full, LL.ListLike full item) = full - Parser full full
 string s = takeWith (LL.length s) (== s)
 
 takeWith :: (LL.ListLike full item) = Int - (full - Bool) - Parser
full full
 takeWith n p = do
 ensure n
 s - get
 let (h, t) = LL.splitAt n s
 if p h then put t  return h else fail takeWith
 
 ensure :: (LL.ListLike full item) = Int - Parser full ()
 ensure n
 = Parser $ \st0@(S s0 _a0 _c0) kf ks -
 if LL.length s0 = n
 then ks st0 ()
 else runParser (demandInput  ensure n) st0 kf ks
 
 prompt :: LL.ListLike i ii
= S i - (S i - Result i r) - (S i - Result i r) - Result
i r
 prompt (S s0 a0 _) kf ks
 = Partial $ \s -
 if LL.null s
 then kf $! S s0 a0 Complete
 else ks $! S (s0 `mappend` s) (a0 `mappend` s) Incomplete
 
 demandInput :: (LL.ListLike full item) = Parser full ()
 demandInput
 = Parser $ \st0 kf ks -
 if more st0 == Complete
 then kf st0 [demandInput] not enough bytes
 else prompt st0 (\st - kf st [demandInput] not enough
bytes) (`ks` ())
 
 get :: Parser full full
 get = Parser (\st0 _ ks - ks st0 (input st0))
 
 put :: full - Parser full ()
 put s = Parser (\(S _ a0 c0) _ ks - ks (S s a0 c0) ())



signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-23 Thread Felipe Almeida Lessa
On Thu, Dec 23, 2010 at 1:01 AM, Evan Laforge qdun...@gmail.com wrote:
 So, my current options are either figure out some way to speed up
 parsec3+Text, revert to parsec2+String and give up, or try an entirely
 different parsing library.  I've heard attoparsec is fast but I'd have
 to switch to utf8 bytestring which is a big change, and Text seems
 like the more correct choice anyway.

 Any ideas or experience?

Michael Snoyman wants attoparsec-text as well [1].

[1] http://docs.yesodweb.com/blog/wishlist/

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-23 Thread Johan Tibell
On Thu, Dec 23, 2010 at 3:03 PM, Felipe Almeida Lessa
felipe.le...@gmail.com wrote:
 Michael Snoyman wants attoparsec-text as well [1].

 [1] http://docs.yesodweb.com/blog/wishlist/

It's on my Christmas wishlist too.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsec2 vs. parsec3... again

2010-12-23 Thread Michael Snoyman
On Thu, Dec 23, 2010 at 6:21 PM, Johan Tibell johan.tib...@gmail.com wrote:
 On Thu, Dec 23, 2010 at 3:03 PM, Felipe Almeida Lessa
 felipe.le...@gmail.com wrote:
 Michael Snoyman wants attoparsec-text as well [1].

 [1] http://docs.yesodweb.com/blog/wishlist/

 It's on my Christmas wishlist too.

 Johan

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


Since I'm sure everyone is thinking it at this point, I'll just say
it: we're all hoping Bryan O'Sullivan saves the day again and writes
this package. He wrote both attoparsec *and* text, so if he writes
attoparsec-text, it will just be double the awesomeness. So Bryan,
please do tell: how many beers (or any other consumable) will it take
to get you to write it? I'll start up the collection fund, and throw
in a six pack ;).

Michael

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] parsec2 vs. parsec3... again

2010-12-22 Thread Evan Laforge
Yeah, I know this has been discussed a number of times, but I have
some concrete questions I haven't seen asked before.  And the parsec
3 is now as fast as parsec 2 thing I've seen around doesn't seem to
be true for me.

I have an app that does a lot of parsing of small expressions.  It's
currently parsec2 operating on lots of little Texts (after an unpack,
of course).  A few parsing functions show up near the top of the
profile output, so I thought an obvious improvement would be to parse
Text directly and avoid the overhead and garbage of unpacking.  Since
parsec3 is now supposed to be as fast as parsec2 I thought I would
give it a try.  Parsec3 is 3.1.0, parsec 2 is 2.1.0.1:

parsec2, String:
total time  =   10.66 secs   (533 ticks @ 20 ms)
total alloc = 2,340,113,404 bytes  (excludes profiling overheads)

parsec3, String: (this is just after upgrading the library and editing
it to fix breakage from Parser being a type alias now)
total time  =   13.76 secs   (688 ticks @ 20 ms)
total alloc = 2,706,625,256 bytes  (excludes profiling overheads)

parsec3, Text: (wrote a Text instance similar to the one for
ByteString, updated imports, no longer unpacking to String)
total time  =   15.96 secs   (798 ticks @ 20 ms)
total alloc = 3,338,005,896 bytes  (excludes profiling overheads)

This is not very encouraging!  Especially strange is how Text
generates *more* allocation... I'd expect less since it doesn't unpack
all the Texts.  The parsing functions are no longer at the top of the
profile, but there are new 'unParser' and 'parsecMap' and 'parserBind'
up at or near the top.  'unParser' just looks like it's unwrapping the
Parsec newtype, so I don't fully understand how it's the most
expensive, but it's called on every bind so it does get called a lot.
There are no obvious super expensive ones, just lots and lots of them
that add up.  Parsec 3's unParser covers up the parsing function I
wrote, so it's now hard to tell what the expensive parsing function
actually is.

I've seen a few remarks that you can't just throw together parsers and
expect them to be fast, you have to profile them, but nothing on how
to actually interpret the results of profiling.

For instance, here's one of the main expensive parsers:

p_unsigned_float :: P.CharParser st Double
p_unsigned_float = do
i - P.many P.digit
f - P.option  (P.char '.'  P.many1 P.digit)
if (null i  null f) then P.pzero else do
let int = List.foldl'
(\total c - 10 * total + fromIntegral (Char.digitToInt c)) 0 i
frac = foldr
(\c total - (total + fromIntegral (Char.digitToInt c)) / 10) 0 f
return (int + frac)

There's an obvious problem where I get the digits as a String and then
parse that with list functions, but I can't see any way to get parsec
to return a chunk of Text.  This is roughly how parsec itself parses
numbers, in Text.Parsec.Token.

So, my current options are either figure out some way to speed up
parsec3+Text, revert to parsec2+String and give up, or try an entirely
different parsing library.  I've heard attoparsec is fast but I'd have
to switch to utf8 bytestring which is a big change, and Text seems
like the more correct choice anyway.

Any ideas or experience?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe