New submission from Duncan Findlay <du...@apache.org>: According to the documentation for tokenize.generate_tokens:
"The generator produces 5-tuples with these members: the token type; the token string; a 2-tuple (srow, scol) of ints specifying the row and column where the token begins in the source; a 2-tuple (erow, ecol) of ints specifying the row and column where the token ends in the source; and the line on which the token was found. The line passed (the last tuple item) is the logical line; continuation lines are included." It seems though that the "logical line" -- the last element of the tuple is the physical line unless the token being returned spans beyond the end of the line. As an example, consider a test file test.py: foo = """ %s """ % 'bar' >>> import pprint, tokenize >>> pprint.pprint(list(tokenize.generate_tokens(open('test.py').readline))) [(1, 'foo', (1, 0), (1, 3), 'foo = """\n'), (51, '=', (1, 4), (1, 5), 'foo = """\n'), (3, '"""\n%s """', (1, 6), (2, 6), 'foo = """\n%s """ % \'bar\'\n'), (51, '%', (2, 7), (2, 8), '%s """ % \'bar\'\n'), (3, "'bar'", (2, 9), (2, 14), '%s """ % \'bar\'\n'), (4, '\n', (2, 14), (2, 15), '%s """ % \'bar\'\n'), (0, '', (3, 0), (3, 0), '')] >>> Since there is only one logical line, I would expect the first 6 tokens to have the same 5th element. ---------- components: Library (Lib) messages: 80353 nosy: duncf severity: normal status: open title: tokenize.generate_tokens doesn't always return logical line type: behavior versions: Python 2.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5028> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com