Michael Torrie wrote:
Levi Pearson wrote:
To clarify Jacob's comment, you've got to eliminate left-recursion in
the grammar before recursive descent parsing will work *at all*.
Consider the following grammar rule:

        <foo>: <foo> + <bar>

In a recursive descent parser, you move through the productions in a
rule from left to right and call the parsers for those productions.
So, when you hit a 'foo', you call the 'foo' parser.  The first
production is 'foo', so you call the 'foo' parser, et voila, infinite
recursion.

The problem with EBNF is that it doesn't easily allow the representation
of iteration without resorting to recursion.  For example, consider the
parsing of a mathematical expression, dealing with adding and
subtracting.  As you allude to, it looks something like this:

factor: factor + term |
        factor - term |
        term

We could put this as right-recursive, which is very easy to parse.  But
sometimes this would not be correct (it would be here, however).  What
we'd really like to say is that:

factor: term followed by 0 or more "+ term" or 0 or more "- term"

This is why I find it so much easier to use syntax diagrams rather than
EBNF both to show the formal grammar and to construct the parser, since
much of what you want to express is iterative.

So for the example above, a syntax diagram could take the form:

factor ---> term ------------------------------->
        /        \                      /
       |          |--> '+' ---> term --
       |          |          /          \
       |            -> '-' -             |
        \                               /
          -----------------------------

This diagram makes it very easy, provided you have at least a few token
lookup, to construct the recursive-descent parser.

def factor():
    result = term()

    while peak_next_token in ['+','-']:
        op = next_token()
        switch op:
            case '+':
                result += term()
            case '-':
                result -= term()
            else:
                raise ParseError
    return result

This is the way we did our compiler in CS 431 at BYU, and it works well
enough.  Certainly it's easier to create syntax diagrams than form EBNF
statements, although they are certainly equivalent.  This example does
convert fairly easily to EBNF, but other examples are not so easy.  For
example the syntax diagram for an if/then/elif/else block.  Using syntax
diagrams you can represent the entire thing in one production, and quite
easily build a recursive-descent parser for that.  Converting it to
EBNF, however, requires one to create quite a few productions.  It's
much more complicated.

Many parser generators, like ANTLR, allow you to define grammars in this
way and automatically generate the parser.

In which way? They take ascii art as input? I had no trouble with the syntax diagrams in 431, but I found them hard to work with. I couldn't send one by email easily. I had to have a PDF viewer and scan through dozens of pages of diagrams (we know how much fun that can be) or lug the large book (which I printed out at Kinkos) around. I decided to write my own parser generator, and so I had to enter all those diagrams into BNF format (which seemed like a reasonable machine-readable choice), and while tedious I don't remember it being difficult. In fact, I seem to remember very quickly finding a few patterns, and converting to BNF was primarily a rote exercise.

--
Hans Fugal ; http://hans.fugal.net

There's nothing remarkable about it. All one has to do is hit the
right keys at the right time and the instrument plays itself.
    -- Johann Sebastian Bach

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to