On Tue, Nov 28, 2000 at 06:58:57PM +0000, Tom Hughes wrote:
> I didn't say that having infinite lookahead was better than allowing
> backtracking. I simply said that the two were equivalent and that any
> problem that can be solved by one can be solved by the other.

Fair enough.

> That's quite a nasty example for a number of reasons. Firstly you
> might have to back up and reparse a very large amount of code as the
> subroutine definition could be a very long way away from the print
> statement.

You wouldn't have to reparse it all. You'd have to insert the new information
into the parse and see how that changes things. It'd probably only change a
very localised area, a single statement per occurence at most.
 
> Secondly in order to know that you needed to back up you'd have to
> remember that you hadd had to guess that foo was a filehandle but
> that it might also be a subroutine and it raises a whole serious of
> questions about what other similar things you might need to remember.
 
Parsing Perl is not easy. :) At some points, you have to say, well, heck, I
don't *know* what this token is. At the moment, perl guesses, and it guesses
reasonably well. But guessing something wrongly which you could have got right
if you'd read the next line strikes me as a little anti-DWIM. 

In a sense, though, you're right; this is a general problem. I'm currently
trying to work out a design for a tokeniser, and it seems to me that there's
going to be a lot of communicating of "hints" between the tokeniser, the lexer
and the parser. 

The other alternative is to completely conflate the three, which would work
but I think people would lose their minds.

Take, for instance:

    ${function($value)}[$val]

Now, how on earth do I split this into tokens? Do I say:

    /${/ - and expect some stuff which will resolve to a variable name or
           array reference, followed by a }

If we go that way, we're passing lots of hints to both the lexer and the
parser.

    /${[^}]+}/ and then /\[[^]]+\]/

If we do that, we have to keep state between the two tokens so that we don't
make [$val] into a reference constructor and stuff up the parser.

    /^${([^}]+)}\[([^\])]/ - Deference $1 as an array, take value $2.

If we do *that*, then we're already being tokeniser, lexer and parser rolled
into one.

Parsing Perl is hard. Trust me. :)

-- 
MISTAKES:
    It Could Be That The Purpose Of Your Life Is Only To Serve As
    A Warning To Others

                                                    http://www.despair.com

Reply via email to