New submission from Gareth Rees <g...@garethrees.org>: The tokenize module is happy to tokenize Python source code that the real tokenizer would reject. Pretty much any instance where tokenizer.c returns ERRORTOKEN will illustrate this feature. Here are some examples:
Python 3.3.0a0 (default:2d69900c0820, Aug 1 2011, 13:46:51) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from tokenize import generate_tokens >>> from io import StringIO >>> def tokens(s): ... """Return a string showing the tokens in the string s.""" ... return '|'.join(t[1] for t in generate_tokens(StringIO(s).readline)) ... >>> # Bad exponent >>> print(tokens('1if 2else 3')) 1|if|2|else|3| >>> 1if 2else 3 File "<stdin>", line 1 1if 2else 3 ^ SyntaxError: invalid token >>> # Bad hexadecimal constant. >>> print(tokens('0xfg')) 0xf|g| >>> 0xfg File "<stdin>", line 1 0xfg ^ SyntaxError: invalid syntax >>> # Missing newline after continuation character. >>> print(tokens('\\pass')) \|pass| >>> \pass File "<stdin>", line 1 \pass ^ SyntaxError: unexpected character after line continuation character It is surprising that the tokenize module does not yield the same tokens as Python itself, but as this limitation only affects incorrect Python code, perhaps it just needs a mention in the tokenize documentation. Something along the lines of, "The tokenize module generates the same tokens as Python's own tokenizer if it is given correct Python code. However, it may incorrectly tokenize Python code containing syntax errors that the real tokenizer would reject." ---------- components: Library (Lib) messages: 141503 nosy: Gareth.Rees priority: normal severity: normal status: open title: tokenize module happily tokenizes code with syntax errors type: behavior versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12675> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com