token precedence override

schmolli Mon, 09 Mar 2009 16:19:22 -0700

summary:

it would be useful to me to be able to define a specific token
priority order for ply.lex instead of relying on lex's analysis of the
order in which functions were defined.


detail:

i wrote a wrapper module around ply that would let me keep all my
token/grammar definitions in one place instead of having it spread out
through a file.  it also helped me with some local readability
requirements (most notably, [pt]_whatever is not a valid function
name, readability-wise, and the docstrings are all jacked up.)

this ply_wrap docstring snippet expresses the basic idea (and probably
contains errors, do not read it too closely):
"""
This framework defines three classes of interest:
  Grammar: Encapsulates the syntax and parsing rules for the target
language.
  Lexer: Encapsulating the lexing of the target language.
  Parser: Encapsulates the parsing of tokens into some desired output,
such as
    an AST.

Users of this framework need to subclass each of these in order to do
anything
useful.  For example:

  class ShellGrammar(ply_wrap.Grammar):
    TOKENS = {'SEMI': r';',
              'COMMENT': r'\#.*',
              'SHEBANG': r'\#!.*',
              ...}
    LITERALS = '><-{}()|&'
    GRAMMAR = {'Expression': 'Expression : Command '
                             '           | Comments '
                             '           | VariableAssignment',
               'Command': 'Command : KEYWORD KeywordArgs'
                          '        | COMMAND CommandArgs',
               'Comments': 'Comments : COMMENT Comments'
                           '         | ',
               ...}
    GRAMMAR_START = 'Expression'

  class ShellLexer(ply_wrap.Lexer):
    grammar_module = ShellGrammar

    def TokenizeSEMI(self, t):
      # no actual work to do
      return t

    def TokenizeCOMMENT(self, t):
      # Handle comments and convert to SHEBANG as appropriate.
      if (t.lexer.lineno == 1 and
          re.match(self.grammar_module.TOKENS['SHEBANG'], t.value)):
        t.type = 'SHEBANG'
      return t

    # ...

  class ShellParser(ply_wrap.Parser):
    grammar_module = ShellGrammar

    def ParseRuleExpression(self, p):
      return p[0]

    def ParseRuleCommand(self, p):
      p[0] = Command(p[1], [2:])

    # ...

  tokenizer = ShellLexer()
  tokenizer.Build()
  tokenizer.Lex(input_text)
  parser = ShellParser(tokenizer)
  parser.Build()
  ast = parser.Parse()
"""

(in the above snippet, assume that Tokenize* methods will be converted
to t_* methods with appropriate docstrings and ParseRule* methods will
be converted to p_* methods, also with appropriate docstrings.)

at the heart of this scheme is a classmethod that defines all the
required [tp]_whatever methods when a Lexer or Parser subclass is
instantiated by decorating the original method with a wrapper function
and setting __doc__ and __name__.

"""
def _AnonymousDecorator(f):
  """Make a function wrapper to hang docstrings off of."""
  return lambda self, p: f(self, p)

class _PlyWrapper(object):
  # eliding a bunch of boring stuff here

  @classmethod
  def FixNames(cls, input_method_prefix, output_method_prefix,
doc_dict,
               decorator=_AnonymousDecorator):
    """Convert class method names into PLY-specific names and
decorate.

    Args:
      input_method_prefix: Method name prefix denoting methods that
should be
        converted.
      output_method_prefix: Method name prefix to use when renaming
methods.
      doc_dict: Dict to get docstrings from.
      decorator: Decorator to wrap around all methods being
converted.  This
        decorator *must* return a function for which __doc__ and
__name__ can be
        manipulated.

    We look for all methods in class cls with names that start with
    input_method_prefix.  For each of these methods, decorate it with
the
    specified decorator, set the docstring to whatever is in doc_dict
with the
    key matching the function name minus the input_method_prefix, and
insert the
    result as a method with a name formed by replacing
input_method_prefix with
    output_method_prefix.

    Example:
    cls.FixNames('Tokenize', 't_', cls.grammar_module.TOKENS,
                 decorator=cls.TrackTokenPosition)
    This find all methods with names starting with "Tokenize" (eg,
    "TokenizeSEMI",) then send TokenizeSEMI through
cls.TrackTokenPosition, set
    the docstring on the result to cls.grammar_module.TOKENS['SEMI'],
and store
    the result as cls.t_SEMI.
    """
    for method in cls._GetMethodsWithPrefix(input_method_prefix):
      method_root = method.__name__[len(input_method_prefix):]
      assert method_root in doc_dict, (
          '%s.%s* methods need properly formatted names.  See docs for
%s.' %
          (cls.__name__, input_method_prefix, cls.__name__))
      decorated_method = decorator(method)
      decorated_method.__name__ = '%s%s' % (output_method_prefix,
method_root)
      decorated_method.__doc__ = doc_dict[method_root]
      setattr(cls, decorated_method.__name__, decorated_method)
      logging.info('Inserted "%s"' % decorated_method.__name__)

class Lexer(_PlyWrapper):
  # defines some Tokenize* methods and calls FixNames with
  # input_method_prefix='Tokenize'

class Parser(_PlyWrapper):
  # defines some ParseRule* methods and calls FixNames with
  # input_method_prefix='ParseRule'

"""

Parser subclasses work just fine because you can define the starting
point for the grammar, but Lexer subclasses do not work at all because
ply.lex thinks all of the methods start on the same line of code.  i
spent a while trying to find a way to fake up the starting line number
before deciding that it would be more straightforward if i could just
tell ply.lex the token precedence directly.  if i could do that, then
i could even convert some of my methods back to class variables.

assuming that i will at least offer to write the code to make ply.lex
understand a token_precedence variable, does this sound cool, or
should i go back to trying to fake out the code objects?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

token precedence override

Reply via email to