On Mar 10, 8:31 pm, robert.mull...@gmail.com wrote: > I am trying to implement a lexer and parser for a subset of python > using lexer and parser generators. (It doesn't matter, but I happen to > be using > ocamllex and ocamlyacc). I've run into the following annoying problem > and hoping someone can tell me what I'm missing. Lexers generated by > such tools return a tokens in a stream as they consume the input text. > But python's indentation appears to require interruption of that > stream. For example, in: > def f(x): > statement1; > statement2; > statement3; > statement4; > A > > Between the '\n' at the end of statement4 and the A, a lexer for > Python should return 2 DEDENT tokens. But there is no way to interject > two DEDENT tokens within the token stream between the tokens for > NEWLINE and A. The generated lexer doesn't have anyway to freeze the > input text pointer. > > Does this mean that python lexers are all written by hand? If not, how > do you do it using your favorite lexer generator? > > Thanks! > > Bob Muller
In pyparsing's indentedBlock expression/helper, I keep a stack of column numbers representing indent levels. When the indent level of a line is less than the column number at the top of the stack, I count one DEDENT for each level that I need to pop off the stack before I get the new indent column. If I get a column number less than the indent column, then I know that this is an illegal indent (doesn't line up with previous indent). Also, when computing the column number, be wary of tab handling. -- Paul -- http://mail.python.org/mailman/listinfo/python-list