Nick Coghlan <ncogh...@gmail.com> added the comment:

There are a *lot* of characters with semantic significance that are reported by 
the tokenize module as generic "OP" tokens:

token.LPAR
token.RPAR
token.LSQB
token.RSQB
token.COLON
token.COMMA
token.SEMI
token.PLUS
token.MINUS
token.STAR
token.SLASH
token.VBAR
token.AMPER
token.LESS
token.GREATER
token.EQUAL
token.DOT
token.PERCENT
token.BACKQUOTE
token.LBRACE
token.RBRACE
token.EQEQUAL
token.NOTEQUAL
token.LESSEQUAL
token.GREATEREQUAL
token.TILDE
token.CIRCUMFLEX
token.LEFTSHIFT
token.RIGHTSHIFT
token.DOUBLESTAR
token.PLUSEQUAL
token.MINEQUAL
token.STAREQUAL
token.SLASHEQUAL
token.PERCENTEQUAL
token.AMPEREQUAL
token.VBAREQUAL
token.CIRCUMFLEXEQUAL
token.LEFTSHIFTEQUAL
token.RIGHTSHIFTEQUAL
token.DOUBLESTAREQUAL¶
token.DOUBLESLASH
token.DOUBLESLASHEQUAL
token.AT

However, I can't fault tokenize for deciding to treat all of those tokens the 
same way - for many source code manipulation purposes, these just need to be 
transcribed literally, and the "OP" token serves that purpose just fine.

As the extensive test updates in the current patch suggest, AMK is also correct 
that changing this away from always returning "OP" tokens (even for characters 
with more specialised tokens available) would be a backwards incompatible 
change.

I think there are two parts to this problem, one documentation related 
(affecting 2.7, 3.2, 3.3) and another that would be an actual change in 3.3:

1. First, I think 3.3 should add an "exact_type" attribute to TokenInfo 
instances (without making it part of the tuple-based API). For most tokens, 
this would be the same as "type", but for OP tokens, it would provide the 
appropriate more specific token ID.

2. Second, the tokenize module documentation should state *explicitly* which 
tokens it collapses down into the generic "OP" token, and explain how to use 
the "string" attribute to recover the more detailed information.

----------
assignee:  -> docs@python
components: +Documentation
nosy: +docs@python, ncoghlan
stage:  -> needs patch
title: function generate_tokens at tokenize.py yields wrong token for colon -> 
Add new attribute to TokenInfo to report specific token IDs
versions: +Python 2.7, Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue2134>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to