Hi, I've just read the blog post, "Visualizing a Python tokenizer" and it reminded me of this:
"OMeta: an object oriented language for pattern matching" http://www.cs.ucla.edu/~awarth/papers/dls07.pdf OMeta is an extension and generalisation of the idea of PEGs*. It provides a nice way to describe a language both at the character level (tokens), the grammar itself and productions into the AST. Finally the grammars are extensible (possibly from within the language itself). The implementation is discussed in "Packrat Parsers Can Support Left Recursion" and there is some discussion of the performance there. http://www.vpri.org/pdf/packrat_TR-2007-002.pdf I wonder whether the same idea behind PyPy can be applied to the grammar. Write a program in some language (a python version of OMeta for instance) which is then transformed by the translator, compiler, or JIT into something that runs fast. What could be nice about this is bringing the tokenising and parsing closer in spirit to the heart of PyPy, writing 'nicer' code, and providing a (I think tantalising) way to try new syntax going forward. And there are things to play with on this page: http://www.cs.ucla.edu/~awarth/ometa/ometa-js/ * Parsing Expression Grammar With regard to railroad diagrams (I think that's what they're called): There used to be a script that generated them - it's mentioned at the top of the python grammar file, and here http://www.python.org/search/hypermail/python-1994q3/0294.html But I've seen discussion elsewhere that it has been lost :( How about this? http://www.informatik.uni-freiburg.de/~thiemann/haskell/ebnf2ps/README cheers, Toby _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
