"Hannes Hirzel"<[email protected]> wrote:
> If I explore the resulting object I got what is shown in the screen
> shot. I am not sure how to interpret the result.
Yes, it's not the best possible representation, but it does show the full parse
tree. The tree can be rather complicated depending on how complicated is the
grammar. That's why I suggested playing with simpler grammars. But the reason
why I even mentioned the TreeBuilder is that it's the simplest possible Actor,
it implements single method that packages the result of each successful
production as it's own node. Maybe using an array instead of associations would
create a slightly more "tree like" result.
> "Regarding the wiki example: the class PEGWikiGenerator is not included"
>
> wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
>
> Where can I get this class?
I've attached a fileout of the class that you can file into Squeak, except it
references VW XML classes, so it won't work. If you fix it up to use the Squeak
XML stuff you should be able to get it to work. All you really need is
XML.Element and XML.Tag equivalent. The second fileout shows how the class
variables are initialized in VW. You might need to turn the initializers into
equivalent class side initialize methods.
> As class methods of PEGParser there are 8 grammars included - CSS3,
> HttpUrl, JSON, JavaScript, PEG, Smalltalk, Wiki, XML.
>
> I think a simple complete example which actually does something would
> be helpful for me. TreeBuilder seems to do nothing and I do not see the tree.
OK, let's try something really simple. The wikipedia page shows a simple
arithmetic expression grammar. Here it is transcribed for Xtreams:
grammar := '
Number <- [0-9]+
Parenthesised <- "(" Expr ")"
Value <- Number / Parenthesised
Product <- Value (("*" / "/") Value)*
Sum <- Product (("+" / "-") Product)*
Expr <- Sum'.
I expanded the Value from the wiki grammar into separate Number and
Parethesised expression productions. It'll hopefully be clear why shortly. You
already know how to get a parser from grammar. That's always the same and for
anything more complex you'll probably want to cache the parser instead of
recreating it every time, but this will do fine for now:
parser := PEG.Parser parserPEG parse: 'Grammar' stream: grammar actor:
PEG.ParserParser new.
With this you can already parse expressions (note that the grammar as it is
written cannot handle whitespace)
parser parse: 'Expr' stream: '3+4*(10-4)' reading actor: nil
you'll get this (roughly formatted to provide at least some resemblance of the
expression tree):
#( #(OrderedCollection ($3 "16r0033") OrderedCollection ())
OrderedCollection (#
('+' #(OrderedCollection ($4 "16r0034")
OrderedCollection (#
('*' #(
'(' #( #(OrderedCollection ($1
"16r0031" $0 "16r0030") OrderedCollection ())
OrderedCollection (#
('-'
#(OrderedCollection ($4 "16r0034") OrderedCollection ())))) ')')))))))
That's obviously not very easy to take apart to work with, that's why you want
an Actor that will receive rule specific callbacks and do whatever. Let's say
we want to evaluate the expression and return the result. We'll make a subclass
of Actor called Evaluator. As I mentioned earlier, the Actor class provides
pragma based mechanism for expressing rule specific callbacks so you simply
write a method for each rule for which you want to do something and annotate it
with a pragma. Let's first try to turn the digit collections into numbers:
Evaluator>>Number: digits
<action: 'Number'>
^digits inject: 0 into: [ :total :digit | total * 10 + ('0123456789'
indexOf: digit) - 1 ]
With this you can parse numbers
parser parse: 'Expr' stream: '3456' reading actor: Evaluator new
Now let's try to add. We'll want a method for the Sum production but there's a
slightly tricky part there. Note that the rule is defined as
Sum <- Product (("+" / "-") Product)*
which means that you get a term (Product) and then zero or more pairs of + or -
and another term. What the method receives will be collection of the first term
and then pairs of the operator with its term. So parsing this:
parser parse: 'Expr' stream: '3+4+5+6+7' reading actor: Evaluator new
will give you
#(3 OrderedCollection (#('+' 4) #('+' 5) #('+' 6) #('+' 7)))
So if you add a method for Sum, that's what it will need to deal with:
Sum: terms
<action: 'Sum'>
^terms last inject: terms first into: [ :total :next |
(next first = '+') ifTrue: [ total + next last ] ifFalse: [
total - next last ] ]
The Actor pragmas provide another convenience for taking more complex input
apart. You can write the above as
Sum: first rest: pairs
<action: 'Sum' arguments: #(1 2)>
^pairs inject: first into: [ :total :pair |
(pair first = '+') ifTrue: [ total + pair last ] ifFalse: [
total - pair last ] ]
The arguments: keyword simply says invoke this method with elements 1 and 2 of
the rule input (terms) as arguments. Obviously the method must take the
corresponding number of arguments as well. So with the above rules the addition
expression should give you 25. The Product rule handling is analogous:
Product: first rest: pairs
<action: 'Product' arguments: #(1 2)>
^pairs inject: first into: [ :total :pair |
(pair first = '*') ifTrue: [ total * pair last ] ifFalse: [
total / pair last ] ]
Finally the parenthesised expression is rather simple:
Parenthesised: expression
<action: 'Parenthesised' arguments: #(2)>
^expression
It receives a three element array: the left bracket, the expression inside and
the right bracket. All we really need to do is return the expression in the
middle. The above is using the arguments: pragma again to achieve that, the
following would be equivalent:
Parenthesised: expression
<action: 'Parenthesised'>
^expression at: 2
I hope by now it's also clearer why it was useful to factor the Value
definition out into separate Number and Parenthesised rule. Otherwise we'd have
to handle both cases in a single Value: method, which would make the code
messier.
> For example besides the classes PEGParserPEGTest and
> PEGParserSmalltalkTest (6 test cases) a class PEGParserHttpUrlTest or
> PEGParserJavaScriptTest would be helpful.
Yup, the tests are seriously wanting as well. Hopefully we'll get there soon.
BTW, the parsing stuff doesn't work for me out of the box in either Squeak or
Pharo in the images that I use (which should be quite clean). The problem I'm
seeing is that the Dictionary>>addAll: in ParserParser>>Grammar: doesn't work
as expected. It's trying to add collection of associations to a dictionary but
ends up adding them as values indexed by integers. I can fix it but I wonder
what do people have loaded that makes it work for them.
Anyway, hopefully the above can get you going.
Cheers,
Martin