Howdy,

One of the "problems" in recursive-descent parsing is constructs that look a lot alike at the front, only to differ at the end -- a sort of end-weight pathology. The example I'm thinking of is the similarity between variable and function declarations in 'C'.


   extern int foo = 0;

   extern int bar(int foo, int mumble) ;

   static int zip(int foo, char * beezle) { return 0; }

I'm dealing with a similar problem in Close (my pet language), and I've tweaked my grammar to favor NOT having to rescan. This means I've got a single declaration rule, but it also means a bunch of <?DECL_MODE_....> predicates, and recognizing ';' at the end of the declaration (except when in parameter mode), etc.

It occurs to me that a win, of sorts, would be for the "declaration" rule to be able to return multiple alternative values - a sort of "result polymorphism".

Thus:

   rule extern_decl {
       | <variable_decl> ';'
       | <function_decl> ';'
       | <function_defn>
   }

could "magically" avoid calling three different rules that scanned mostly the same tokens by some sleight of hand:

   rule variable_decl is parsed(&super_duper_decl);
   rule function_decl is parsed(&super_duper_decl);
   rule function_defn is parsed(&super_duper_decl);

But how to indicate which result you are returning?

Alternatively, it may be the case that some kind of intermediate-level results memoization would do the trick. If the <storage_class> rule memoized its result for offset X, it could generally do the right thing, except that it would have to know when not to memoize (as when some idiot puts in a <?DECL_MODE...> state flag). So maybe

   rule storage_class is memoized {...}

is the right thing?

Anyway, I'm not proposing anything so much as wondering out loud. Surely there's a bunch of smarter bears than me who have given this some thought. Any wisdom?

=Austin

Reply via email to