I am trying to implement a lexer and parser for a subset of python using lexer and parser generators. (It doesn't matter, but I happen to be using ocamllex and ocamlyacc). I've run into the following annoying problem and hoping someone can tell me what I'm missing. Lexers generated by such tools return a tokens in a stream as they consume the input text. But python's indentation appears to require interruption of that stream. For example, in: def f(x): statement1; statement2; statement3; statement4; A
Between the '\n' at the end of statement4 and the A, a lexer for Python should return 2 DEDENT tokens. But there is no way to interject two DEDENT tokens within the token stream between the tokens for NEWLINE and A. The generated lexer doesn't have anyway to freeze the input text pointer. Does this mean that python lexers are all written by hand? If not, how do you do it using your favorite lexer generator? Thanks! Bob Muller -- http://mail.python.org/mailman/listinfo/python-list