New submission from Gareth Rees <g...@garethrees.org>:

The tokenize module is happy to tokenize Python source code that the real 
tokenizer would reject. Pretty much any instance where tokenizer.c returns 
ERRORTOKEN will illustrate this feature. Here are some examples:

    Python 3.3.0a0 (default:2d69900c0820, Aug  1 2011, 13:46:51) 
    [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from tokenize import generate_tokens
    >>> from io import StringIO
    >>> def tokens(s):
    ...    """Return a string showing the tokens in the string s."""
    ...    return '|'.join(t[1] for t in generate_tokens(StringIO(s).readline))
    ...
    >>> # Bad exponent
    >>> print(tokens('1if 2else 3'))
    1|if|2|else|3|
    >>> 1if 2else 3
      File "<stdin>", line 1
        1if 2else 3
             ^
    SyntaxError: invalid token
    >>> # Bad hexadecimal constant.
    >>> print(tokens('0xfg'))
    0xf|g|
    >>> 0xfg
      File "<stdin>", line 1
        0xfg
           ^
    SyntaxError: invalid syntax
    >>> # Missing newline after continuation character.
    >>> print(tokens('\\pass'))
    \|pass|
    >>> \pass 
      File "<stdin>", line 1
        \pass
            ^
    SyntaxError: unexpected character after line continuation character

It is surprising that the tokenize module does not yield the same tokens as 
Python itself, but as this limitation only affects incorrect Python code, 
perhaps it just needs a mention in the tokenize documentation. Something along 
the lines of, "The tokenize module generates the same tokens as Python's own 
tokenizer if it is given correct Python code. However, it may incorrectly 
tokenize Python code containing syntax errors that the real tokenizer would 
reject."

----------
components: Library (Lib)
messages: 141503
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: tokenize module happily tokenizes code with syntax errors
type: behavior
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12675>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to