New submission from Gareth Rees <[email protected]>:
The tokenize module is happy to tokenize Python source code that the real
tokenizer would reject. Pretty much any instance where tokenizer.c returns
ERRORTOKEN will illustrate this feature. Here are some examples:
Python 3.3.0a0 (default:2d69900c0820, Aug 1 2011, 13:46:51)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tokenize import generate_tokens
>>> from io import StringIO
>>> def tokens(s):
... """Return a string showing the tokens in the string s."""
... return '|'.join(t[1] for t in generate_tokens(StringIO(s).readline))
...
>>> # Bad exponent
>>> print(tokens('1if 2else 3'))
1|if|2|else|3|
>>> 1if 2else 3
File "<stdin>", line 1
1if 2else 3
^
SyntaxError: invalid token
>>> # Bad hexadecimal constant.
>>> print(tokens('0xfg'))
0xf|g|
>>> 0xfg
File "<stdin>", line 1
0xfg
^
SyntaxError: invalid syntax
>>> # Missing newline after continuation character.
>>> print(tokens('\\pass'))
\|pass|
>>> \pass
File "<stdin>", line 1
\pass
^
SyntaxError: unexpected character after line continuation character
It is surprising that the tokenize module does not yield the same tokens as
Python itself, but as this limitation only affects incorrect Python code,
perhaps it just needs a mention in the tokenize documentation. Something along
the lines of, "The tokenize module generates the same tokens as Python's own
tokenizer if it is given correct Python code. However, it may incorrectly
tokenize Python code containing syntax errors that the real tokenizer would
reject."
----------
components: Library (Lib)
messages: 141503
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: tokenize module happily tokenizes code with syntax errors
type: behavior
versions: Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12675>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com