On Thu, Jul 08, 2004 at 04:49:33AM -0600, Luke Palmer wrote:
: Michele Dondi writes:
: > On the wild side of things, could there be the possibility of even
: > defining new ones?
: 
: That's what I meant by:
: 
:     grammatical_category:postcircumfix
: 
: Though it wouldn't be so magical as to just know what you mean.  If your
: mucking with the grammar, though, you should be able to insert hooks.
: After all, the writers of the perl 6 parser have to do it.
: 
:     rule prefix_op() {
:         (@(%Perl::guts::grammatical_categories«prefix»))
:         <prefix_op>
:       |
:         <term>
:     }
: 
: Or something.

I like it when someone says "or something" about the same place I'd
say "or something".  :-)

However, in the interests of dewaffling, I have a couple of quibbles.
I don't know what that @() is doing there--I presume you meant @{}.
Also, it's not clear that you want an array there, but I understand
you're indicating that the tokens have to be matched in some particular
order that is unspecified but not arbitrary (presumably longer
tokens preceding any shorter prefixes of those tokens).  As I said in
another message, though, we might want to force hashes to automatically
tokenize in a longest-token-first fashion (or at least have the option
of doing so), and using a hash would allow the keys to be the strings
and the values to be individual actions to be taken.  With an array
match, you might find yourself redispatching individual operators in a
switch statement to provide that kind of specificity.  For efficiency,
either an array or a hash would want to be preprocessed into some
other kind of trie or other data structure for fast tokenizing anyway,
so it's not like doing it with an array is buying you much unless you
really need to specify the order of matching.

You might think we need to specify order so that lexicalized operator
definitions can override more global ones, but I suspect we actually
have to copy the array or hash into the derived grammar in any event to
properly emulate method overriding for things that aren't really methods,
so that when we revert the grammar it reverts the user-defined operators
as well.

Or something...

My other quibble is that I hope this level of operator can be parsed
with operator precedence rather than rules.  Higher level rules
drop into the operator precedence parser when they see things like
<expr>, and the operator precedence parser drops into lower level
rules before returning a "term" token (or if a macro specifies a
particular followup parsing rule).  Of course, it's possible that
our tokener is just a fancy rule, in which case it would strongly
resemble what you have above, only maybe with more alternatives,
depending on where we decide to recognize the various kinds of terms.

Oddly, depending on how we decide to do operator precedence, we might
not do the conventional thing of treating parenthesized expressions
as terms, but just make parens into pseudo operators that jack up
the internal precedence and return the parens as individual tokens.
But maybe we should stick with the ordinary recursive definition--it
might give better error messages on missing parens, and we've already
eliminated the 20-odd recursion levels that a strict recursive descent
parser would impose on parentheses anyway.

Or something.  :-)

Larry

Reply via email to