Hi holger

At ESUG Jan showed that he is working on a kind of PetitParser compiler.

Le 16/8/15 17:58, Holger Freyther a écrit :
Hi,

once again I am not sure if this is the right list. The first parser I wrote 
using
PetitParser was a SIP (and then MGCP) parser. I have recently ported[1] the
code to Pharo and with Pharo it is very tempting to Use BlockClosure>>#bench
to get an idea of the speed.


I have two performance “issues” and wonder if others hand similar issues with
PetitParser and if there is a general approach to this.


1.) Combining two PPCharsetPredicates does not combine the “classification”
table it had. One could create a PPPredicateObjectParser subclass that is
special casing >>#/ to build a combined classification table.

2.) When blindly following a BNF enumeration of "A or B or C or D or E
or CatchAll” and each “A, B” follow common pattern (e.g. token COLON value)
one pays a high cost in the backtracking and constructing the PPFailure for
each failed case.

In my SIPGrammar I have action parsers for To ==>.. From ==>  and would
like to keep that. At the same time I would be happy if the token in front of 
the
colon is only consumed once and then delegated to the right parser and if that
one failed use the ‘catch all’ one.

I don’t know which abstraction would be needed to allow creating optimized
PetitParsers for such grammars.

sorry for the long mail, long details and context is below.
What thierry will tell you is that if you do not need to compose your grammars and the grammar is not evolving and you need speed then a SmaCC approache with LGR, LALR or something else
is probably faster.

Stef



kind regards
        holger






Full details:


1.) CharSetPredicate

| aParser bParser combinedParser aTime bTime cTime |

aParser := #digit asParser.
bParser := #letter asParser.
combinedParser := aParser / bParser.

aTime := [  aParser parse: 'b'] bench.
bTime := [ bParser parse: 'b'] bench.
cTime := [ combinedParser parse: 'b'] bench.
{ aTime. bTime. cTime }

cTime is bounded by the time execution time of of the slowest
of these parsers + overhead (e.g. PPFailure creation).

e.g.

#('559,000 per second.' '1,010,000 per second.' '429,000 per second.')

With a proof of concept PPPredicateCharSetParser

#('1,330,000 per second.' '1,550,000 per second.' '1,580,000 per second.’)

The noise is pretty string here but what is important is that bParser and the
combinedParser are in the same ballpark.

2.) Choice Parser



The BNF grammar of the parser is roughly:

Request        =  Request-Line
                   *( message-header )
                   CRLF
                   [ message-body ]

message-header  =  (Accept
                …
                / To
                / From
                / Via
                /  extension-header) CRLF

Alert-Info   =  "Alert-Info" HCOLON alert-param *(COMMA alert-param)
Accept         =  "Accept" HCOLON
                    [ accept-range *(COMMA accept-range) ]


So there can be several lines of “message-header”. And each method header
starts with a token/word, a colon and then the parameter. “extension-header”
is kind of a catch all if no other rule matched. E.g. if a client sends a To 
which is
wrongly encoded it would end up with the extension-header.

I transferred the above naively to PetitParser and end up with something like
parsing ~500 messages a second. The main cost appears to come from the
choice parser that needs to create a PetitFailure all the time. E.g. if you 
have a
line like this:

        ‘From: “Holger Freyther” <sip:[email protected]>’

The choice parser will start with the “Accept” rule, parse the token (“From” and
then create a PPFailure, then … rules, then “To”, parse the token.. So we have
parsing the same token more than once and creating PPFailures all the time. I
ended up creating a custom parser that will peek the token, have a horrible 
chain
of token = ‘XYZ’ ifTrue and then dispatch to the other rule.

It would be nice if PetitParser could be taught to only parse the token once and
then delegate to the param rule. E.g. a PPAnyOfParser that allows to specify the
token to match, the parser to continue with and a fallback parser?



[1]
http://smalltalkhub.com/#!/~osmocom/SIP
http://smalltalkhub.com/#!/~osmocom/MGCP



Reply via email to