>> But a more robust and forgiving parser is needed.
> Well at least it would be nice if it could report errors, but that's a
> pain to do with LR parsers.
SmaCC can report errors.
I guess for Citezen something that can silently parse across errors is
needed. Something that builds an as complete as possible model from
input that is not necessarily entirely valid.
I suggest that you have a look at the PetitParser PEG framework
(http://source.lukas-renggli.ch/petit.html). It uses an
object-oriented approach, pure Smalltalk syntax, and comes with an
extensive test suite that covers 100% of the code. Ambiguous grammars
are supported and you can speed them up using memoization if you want
> I think I have an OMeta attempt somewhere but it will probably be
> slower than SmaCC.
Parsing all methods of Object and Morph (1390 methods in total) takes:
688ms with the hand written RBParser,
806ms with the hand written Squeak parser,
2518ms with the pre-compiled and heavy optimized SmaCC parser,
3700ms with the not optimized PetitParser Smalltalk parser
I don't know where OMeta would be in this comparison, unfortunately it
does only include a parser for Smalltalk expressions. A probably less
accurate comparison with just a single expression parser (the
factorial function) parsed 1000 times gives:
560ms with the hand written RBParser,
602ms with the hand written Squeak parser,
2564ms with the pre-compiled and heavy optimized SmaCC parser,
4867ms with the not optimized PetitParser Smalltalk parser,
25098ms with the OMeta Smalltalk expression parser
I have no idea why OMeta is so slow? Otherwise however, I conclude
that it doesn't matter much speed-wise what kind of parser you pick.
LR parser are probably not that much of a hype anymore ;-)
Pharo-project mailing list