You will probably have more chances to get a response on the Moose mailing list.
Cheers, Doru On Sun, Aug 16, 2015 at 5:58 PM, Holger Freyther <[email protected]> wrote: > Hi, > > once again I am not sure if this is the right list. The first parser I > wrote using > PetitParser was a SIP (and then MGCP) parser. I have recently ported[1] the > code to Pharo and with Pharo it is very tempting to Use > BlockClosure>>#bench > to get an idea of the speed. > > > I have two performance “issues” and wonder if others hand similar issues > with > PetitParser and if there is a general approach to this. > > > > 1.) Combining two PPCharsetPredicates does not combine the “classification” > table it had. One could create a PPPredicateObjectParser subclass that is > special casing >>#/ to build a combined classification table. > > > 2.) When blindly following a BNF enumeration of "A or B or C or D or E > or CatchAll” and each “A, B” follow common pattern (e.g. token COLON value) > one pays a high cost in the backtracking and constructing the PPFailure for > each failed case. > > In my SIPGrammar I have action parsers for To ==>.. From ==> and would > like to keep that. At the same time I would be happy if the token in front > of the > colon is only consumed once and then delegated to the right parser and if > that > one failed use the ‘catch all’ one. > > I don’t know which abstraction would be needed to allow creating optimized > PetitParsers for such grammars. > > sorry for the long mail, long details and context is below. > > > kind regards > holger > > > > > > > Full details: > > > 1.) CharSetPredicate > > | aParser bParser combinedParser aTime bTime cTime | > > aParser := #digit asParser. > bParser := #letter asParser. > combinedParser := aParser / bParser. > > aTime := [ aParser parse: 'b'] bench. > bTime := [ bParser parse: 'b'] bench. > cTime := [ combinedParser parse: 'b'] bench. > { aTime. bTime. cTime } > > cTime is bounded by the time execution time of of the slowest > of these parsers + overhead (e.g. PPFailure creation). > > e.g. > > #('559,000 per second.' '1,010,000 per second.' '429,000 per second.') > > With a proof of concept PPPredicateCharSetParser > > #('1,330,000 per second.' '1,550,000 per second.' '1,580,000 per second.’) > > The noise is pretty string here but what is important is that bParser and > the > combinedParser are in the same ballpark. > > 2.) Choice Parser > > > > The BNF grammar of the parser is roughly: > > Request = Request-Line > *( message-header ) > CRLF > [ message-body ] > > message-header = (Accept > … > / To > / From > / Via > / extension-header) CRLF > > Alert-Info = "Alert-Info" HCOLON alert-param *(COMMA alert-param) > Accept = "Accept" HCOLON > [ accept-range *(COMMA accept-range) ] > > > So there can be several lines of “message-header”. And each method header > starts with a token/word, a colon and then the parameter. > “extension-header” > is kind of a catch all if no other rule matched. E.g. if a client sends a > To which is > wrongly encoded it would end up with the extension-header. > > I transferred the above naively to PetitParser and end up with something > like > parsing ~500 messages a second. The main cost appears to come from the > choice parser that needs to create a PetitFailure all the time. E.g. if > you have a > line like this: > > ‘From: “Holger Freyther” <sip:[email protected]>’ > > The choice parser will start with the “Accept” rule, parse the token > (“From” and > then create a PPFailure, then … rules, then “To”, parse the token.. So we > have > parsing the same token more than once and creating PPFailures all the > time. I > ended up creating a custom parser that will peek the token, have a > horrible chain > of token = ‘XYZ’ ifTrue and then dispatch to the other rule. > > It would be nice if PetitParser could be taught to only parse the token > once and > then delegate to the param rule. E.g. a PPAnyOfParser that allows to > specify the > token to match, the parser to continue with and a fallback parser? > > > > [1] > http://smalltalkhub.com/#!/~osmocom/SIP > http://smalltalkhub.com/#!/~osmocom/MGCP > -- www.tudorgirba.com "Every thing has its own flow"
