"Hannes Hirzel"<[email protected]> wrote:
> If I explore the resulting object I got what is shown in the screen
> shot. I am not sure how to interpret the result.

Yes, it's not the best possible representation, but it does show the full parse 
tree. The tree can be rather complicated depending on how complicated is the 
grammar. That's why I suggested playing with simpler grammars. But the reason 
why I even mentioned the TreeBuilder is that it's the simplest possible Actor, 
it implements single method that packages the result of each successful 
production as it's own node. Maybe using an array instead of associations would 
create a slightly more "tree like" result.

> "Regarding the wiki example: the class PEGWikiGenerator is not included"
>
>         wikiParser parse: 'Page' stream: input actor: PEGWikiGenerator new.
>
> Where can I get this class?

I've attached a fileout of the class that you can file into Squeak, except it 
references VW XML classes, so it won't work. If you fix it up to use the Squeak 
XML stuff you should be able to get it to work. All you really need is 
XML.Element and XML.Tag equivalent. The second fileout shows how the class 
variables are initialized in VW. You might need to turn the initializers into 
equivalent class side initialize methods.

> As class methods of PEGParser there are 8 grammars included - CSS3,
> HttpUrl, JSON, JavaScript, PEG, Smalltalk, Wiki, XML.
>
> I think a simple complete example which actually does something would
> be helpful for me. TreeBuilder  seems to do nothing and I do not see the tree.

OK, let's try something really simple. The wikipedia page shows a simple 
arithmetic expression grammar. Here it is transcribed for Xtreams:

        grammar := '
                Number <- [0-9]+
                Parenthesised <- "(" Expr ")"
                Value <- Number / Parenthesised
                Product <- Value (("*" / "/") Value)*
                Sum <- Product (("+" / "-") Product)*
                Expr <- Sum'.

I expanded the Value from the wiki grammar into separate Number and 
Parethesised expression productions. It'll hopefully be clear why shortly. You 
already know how to get a parser from grammar. That's always the same and for 
anything more complex you'll probably want to cache the parser instead of 
recreating it every time, but this will do fine for now:

        parser := PEG.Parser parserPEG parse: 'Grammar' stream: grammar actor: 
PEG.ParserParser new.

With this you can already parse expressions (note that the grammar as it is 
written cannot handle whitespace)

        parser parse: 'Expr' stream: '3+4*(10-4)' reading actor: nil

you'll get this (roughly formatted to provide at least some resemblance of the 
expression tree):

        #(      #(OrderedCollection ($3 "16r0033") OrderedCollection ())
                OrderedCollection (#
                        ('+' #(OrderedCollection ($4 "16r0034")
                        OrderedCollection (#
                                ('*' #(
                                        '(' #(  #(OrderedCollection ($1 
"16r0031" $0 "16r0030") OrderedCollection ())
                                                        OrderedCollection (#
                                                        ('-' 
#(OrderedCollection ($4 "16r0034") OrderedCollection ())))) ')')))))))

That's obviously not very easy to take apart to work with, that's why you want 
an Actor that will receive rule specific callbacks and do whatever. Let's say 
we want to evaluate the expression and return the result. We'll make a subclass 
of Actor called Evaluator. As I mentioned earlier, the Actor class provides 
pragma based mechanism for expressing rule specific callbacks so you simply 
write a method for each rule for which you want to do something and annotate it 
with a pragma. Let's first try to turn the digit collections into numbers:

Evaluator>>Number: digits
        <action: 'Number'>
        ^digits inject: 0 into: [ :total :digit | total * 10 + ('0123456789' 
indexOf: digit) - 1 ]


With this you can parse numbers

        parser parse: 'Expr' stream: '3456' reading actor: Evaluator new

Now let's try to add. We'll want a method for the Sum production but there's a 
slightly tricky part there. Note that the rule is defined as

        Sum <- Product (("+" / "-") Product)*

which means that you get a term (Product) and then zero or more pairs of + or - 
and another term. What the method receives will be collection of the first term 
and then pairs of the operator with its term. So parsing this:

        parser parse: 'Expr' stream: '3+4+5+6+7' reading actor: Evaluator new

will give you

        #(3 OrderedCollection (#('+' 4) #('+' 5) #('+' 6) #('+' 7)))

So if you add a method for Sum, that's what it will need to deal with:

Sum: terms
        <action: 'Sum'>
        ^terms last inject: terms first into: [ :total :next |
                (next first = '+') ifTrue: [ total + next last ] ifFalse: [ 
total - next last ] ]

The Actor pragmas provide another convenience for taking more complex input 
apart. You can write the above as

Sum: first rest: pairs
        <action: 'Sum' arguments: #(1 2)>
        ^pairs inject: first into: [ :total :pair |
                (pair first = '+') ifTrue: [ total + pair last ] ifFalse: [ 
total - pair last ] ]

The arguments: keyword simply says invoke this method with elements 1 and 2 of 
the rule input (terms) as arguments. Obviously the method must take the 
corresponding number of arguments as well. So with the above rules the addition 
expression should give you 25. The Product rule handling is analogous:

Product: first rest: pairs
        <action: 'Product' arguments: #(1 2)>
        ^pairs inject: first into: [ :total :pair |
                (pair first = '*') ifTrue: [ total * pair last ] ifFalse: [ 
total / pair last ] ]

Finally the parenthesised expression is rather simple:

Parenthesised: expression
        <action: 'Parenthesised' arguments: #(2)>
        ^expression

It receives a three element array: the left bracket, the expression inside and 
the right bracket. All we really need to do is return the expression in the 
middle. The above is using the arguments: pragma again to achieve that, the 
following would be equivalent:

Parenthesised: expression
        <action: 'Parenthesised'>
        ^expression at: 2

I hope by now it's also clearer why it was useful to factor the Value 
definition out into separate Number and Parenthesised rule. Otherwise we'd have 
to handle both cases in a single Value: method, which would make the code 
messier.

> For example besides the classes PEGParserPEGTest and
> PEGParserSmalltalkTest (6 test cases) a class PEGParserHttpUrlTest or
> PEGParserJavaScriptTest would be helpful.

Yup, the tests are seriously wanting as well. Hopefully we'll get there soon. 
BTW, the parsing stuff doesn't work for me out of the box in either Squeak or 
Pharo in the images that I use (which should be quite clean). The problem I'm 
seeing is that the Dictionary>>addAll: in ParserParser>>Grammar: doesn't work 
as expected. It's trying to add collection of associations to a dictionary but 
ends up adding them as values indexed by integers. I can fix it but I wonder 
what do people have loaded that makes it work for them.

Anyway, hopefully the above can get you going.

Cheers,

Martin

Reply via email to