Re: [Pharo-dev] PetitParser speed and cost of PPFailure

Tudor Girba Sun, 16 Aug 2015 09:14:53 -0700

You will probably have more chances to get a response on the Moose mailing
list.


Cheers,
Doru

On Sun, Aug 16, 2015 at 5:58 PM, Holger Freyther <[email protected]> wrote:

> Hi,
>
> once again I am not sure if this is the right list. The first parser I
> wrote using
> PetitParser was a SIP (and then MGCP) parser. I have recently ported[1] the
> code to Pharo and with Pharo it is very tempting to Use
> BlockClosure>>#bench
> to get an idea of the speed.
>
>
> I have two performance “issues” and wonder if others hand similar issues
> with
> PetitParser and if there is a general approach to this.
>
>
>
> 1.) Combining two PPCharsetPredicates does not combine the “classification”
> table it had. One could create a PPPredicateObjectParser subclass that is
> special casing >>#/ to build a combined classification table.
>
>
> 2.) When blindly following a BNF enumeration of "A or B or C or D or E
> or CatchAll” and each “A, B” follow common pattern (e.g. token COLON value)
> one pays a high cost in the backtracking and constructing the PPFailure for
> each failed case.
>
> In my SIPGrammar I have action parsers for To ==>.. From ==>  and would
> like to keep that. At the same time I would be happy if the token in front
> of the
> colon is only consumed once and then delegated to the right parser and if
> that
> one failed use the ‘catch all’ one.
>
> I don’t know which abstraction would be needed to allow creating optimized
> PetitParsers for such grammars.
>
> sorry for the long mail, long details and context is below.
>
>
> kind regards
>         holger
>
>
>
>
>
>
> Full details:
>
>
> 1.) CharSetPredicate
>
> | aParser bParser combinedParser aTime bTime cTime |
>
> aParser := #digit asParser.
> bParser := #letter asParser.
> combinedParser := aParser / bParser.
>
> aTime := [  aParser parse: 'b'] bench.
> bTime := [ bParser parse: 'b'] bench.
> cTime := [ combinedParser parse: 'b'] bench.
> { aTime. bTime. cTime }
>
> cTime is bounded by the time execution time of of the slowest
> of these parsers + overhead (e.g. PPFailure creation).
>
> e.g.
>
> #('559,000 per second.' '1,010,000 per second.' '429,000 per second.')
>
> With a proof of concept PPPredicateCharSetParser
>
> #('1,330,000 per second.' '1,550,000 per second.' '1,580,000 per second.’)
>
> The noise is pretty string here but what is important is that bParser and
> the
> combinedParser are in the same ballpark.
>
> 2.) Choice Parser
>
>
>
> The BNF grammar of the parser is roughly:
>
> Request        =  Request-Line
>                   *( message-header )
>                   CRLF
>                   [ message-body ]
>
> message-header  =  (Accept
>                 …
>                 / To
>                 / From
>                 / Via
>                 /  extension-header) CRLF
>
> Alert-Info   =  "Alert-Info" HCOLON alert-param *(COMMA alert-param)
> Accept         =  "Accept" HCOLON
>                    [ accept-range *(COMMA accept-range) ]
>
>
> So there can be several lines of “message-header”. And each method header
> starts with a token/word, a colon and then the parameter.
> “extension-header”
> is kind of a catch all if no other rule matched. E.g. if a client sends a
> To which is
> wrongly encoded it would end up with the extension-header.
>
> I transferred the above naively to PetitParser and end up with something
> like
> parsing ~500 messages a second. The main cost appears to come from the
> choice parser that needs to create a PetitFailure all the time. E.g. if
> you have a
> line like this:
>
>         ‘From: “Holger Freyther” <sip:[email protected]>’
>
> The choice parser will start with the “Accept” rule, parse the token
> (“From” and
> then create a PPFailure, then … rules, then “To”, parse the token.. So we
> have
> parsing the same token more than once and creating PPFailures all the
> time. I
> ended up creating a custom parser that will peek the token, have a
> horrible chain
> of token = ‘XYZ’ ifTrue and then dispatch to the other rule.
>
> It would be nice if PetitParser could be taught to only parse the token
> once and
> then delegate to the param rule. E.g. a PPAnyOfParser that allows to
> specify the
> token to match, the parser to continue with and a fallback parser?
>
>
>
> [1]
> http://smalltalkhub.com/#!/~osmocom/SIP
> http://smalltalkhub.com/#!/~osmocom/MGCP
>



-- 
www.tudorgirba.com

"Every thing has its own flow"

Re: [Pharo-dev] PetitParser speed and cost of PPFailure

Reply via email to