On Mar 10, 9:38 pm, Paul McGuire <pt...@austin.rr.com> wrote: > On Mar 10, 8:31 pm, robert.mull...@gmail.com wrote: > > > > > I am trying to implement a lexer and parser for a subset of python > > using lexer and parser generators. (It doesn't matter, but I happen to > > be using > > ocamllex and ocamlyacc). I've run into the following annoying problem > > and hoping someone can tell me what I'm missing. Lexers generated by > > such tools return a tokens in a stream as they consume the input text. > > But python's indentation appears to require interruption of that > > stream. For example, in: > > def f(x): > > statement1; > > statement2; > > statement3; > > statement4; > > A > > > Between the '\n' at the end of statement4 and the A, a lexer for > > Python should return 2 DEDENT tokens. But there is no way to interject > > two DEDENT tokens within the token stream between the tokens for > > NEWLINE and A. The generated lexer doesn't have anyway to freeze the > > input text pointer. > > > Does this mean that python lexers are all written by hand? If not, how > > do you do it using your favorite lexer generator? > > > Thanks! > > > Bob Muller > > In pyparsing's indentedBlock expression/helper, I keep a stack of > column numbers representing indent levels. When the indent level of a > line is less than the column number at the top of the stack, I count > one DEDENT for each level that I need to pop off the stack before I > get the new indent column. If I get a column number less than the > indent column, then I know that this is an illegal indent (doesn't > line up with previous indent). Also, when computing the column > number, be wary of tab handling. > > -- Paul
Thank you Paul. I am also using the same stack as suggested in the documentation: http://docs.python.org/reference/lexical_analysis.html I understand the method, but when you say you "count one DEDENT for each level" well lets say you counted 3 of them. Do you have a way to interject 3 consecutive DEDENT tokens into the token stream so that the parser receives them before it receives the next real token? Thanks much! Bob Muller -- http://mail.python.org/mailman/listinfo/python-list