[issue2134] Add new attribute to TokenInfo to report specific token IDs

2012-01-18 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 75baef657770 by Meador Inge in branch '2.7':
Issue #2134: Clarify token.OP handling rationale in tokenize documentation.
http://hg.python.org/cpython/rev/75baef657770

New changeset dfd74d752b0e by Meador Inge in branch '3.2':
Issue #2134: Clarify token.OP handling rationale in tokenize documentation.
http://hg.python.org/cpython/rev/dfd74d752b0e

New changeset f4976fa6e830 by Meador Inge in branch 'default':
Issue #2134: Add support for tokenize.TokenInfo.exact_type.
http://hg.python.org/cpython/rev/f4976fa6e830

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2012-01-18 Thread Meador Inge

Meador Inge mead...@gmail.com added the comment:

Fixed.  Thanks for the reviews everyone.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2012-01-15 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-24 Thread Meador Inge

Meador Inge mead...@gmail.com added the comment:

 The cmdoption directive should be used with a program directive.

Ah, nice.  Thanks for the tip Éric.

Updated patch attached along with a patch for the 2.7/3.2 doc update attached.

--
Added file: http://bugs.python.org/file24088/tokenize-exact-type-v1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-24 Thread Meador Inge

Changes by Meador Inge mead...@gmail.com:


Added file: http://bugs.python.org/file24089/tokenize-docs-2.7-3.2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-21 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

The cmdoption directive should be used with a program directive.  See 
library/trace for an example of how to use it and to see the anchors and index 
entries it generates.

--
nosy: +eric.araujo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-18 Thread Meador Inge

Meador Inge mead...@gmail.com added the comment:

The proposed documentation text seems too complicated and language expert 
speaky to me.  We should try to link to standard definitions when possible to 
reduce the text here.  For example, I believe the Operators and Delimiters 
tokens in the Lexical Analysis section of the docs 
(http://docs.python.org/dev/reference/lexical_analysis.html#operators) are 
exactly what we are trying to describe when referencing literal tokens and 
affected tokens.

I like Nick's idea to introduce a new attribute for the exact type, while 
keeping the tuple structure itself backwards compatible.  Attached is a patch 
for 3.3 that updates the docs, adds exact_type, adds new unit tests, and adds a 
new CLI option for displaying token names using the exact type.

An example of the new CLI option is:

$ echo '1+2**4' | ./python -m tokenize
1,0-1,1:NUMBER '1'
1,1-1,2:OP '+'
1,2-1,3:NUMBER '2'
1,3-1,5:OP '**'   
1,5-1,6:NUMBER '4'
1,6-1,7:NEWLINE'\n'   
2,0-2,0:ENDMARKER  '' 
$ echo '1+2**4' | ./python -m tokenize -e
1,0-1,1:NUMBER '1'
1,1-1,2:PLUS   '+'
1,2-1,3:NUMBER '2'
1,3-1,5:DOUBLESTAR '**'   
1,5-1,6:NUMBER '4'
1,6-1,7:NEWLINE'\n'   
2,0-2,0:ENDMARKER  ''

--
Added file: http://bugs.python.org/file24045/tokenize-exact-type-v0.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-18 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

Meador's patch looks good to me. The docs change for 2.7 and 3.2 would be 
similar, just with text like Specific tokens can be distinguished by checking 
the ``string`` attribute of OP tokens for a match with the expected character 
sequence. replacing the reference to the new exact_type attribute.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-15 Thread Meador Inge

Changes by Meador Inge mead...@gmail.com:


--
nosy: +meador.inge

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-15 Thread Eric Snow

Changes by Eric Snow ericsnowcurren...@gmail.com:


--
nosy: +eric.snow

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-15 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Both the proposed text and 3.3 addition look good to me.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-15 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-14 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

There are a *lot* of characters with semantic significance that are reported by 
the tokenize module as generic OP tokens:

token.LPAR
token.RPAR
token.LSQB
token.RSQB
token.COLON
token.COMMA
token.SEMI
token.PLUS
token.MINUS
token.STAR
token.SLASH
token.VBAR
token.AMPER
token.LESS
token.GREATER
token.EQUAL
token.DOT
token.PERCENT
token.BACKQUOTE
token.LBRACE
token.RBRACE
token.EQEQUAL
token.NOTEQUAL
token.LESSEQUAL
token.GREATEREQUAL
token.TILDE
token.CIRCUMFLEX
token.LEFTSHIFT
token.RIGHTSHIFT
token.DOUBLESTAR
token.PLUSEQUAL
token.MINEQUAL
token.STAREQUAL
token.SLASHEQUAL
token.PERCENTEQUAL
token.AMPEREQUAL
token.VBAREQUAL
token.CIRCUMFLEXEQUAL
token.LEFTSHIFTEQUAL
token.RIGHTSHIFTEQUAL
token.DOUBLESTAREQUAL¶
token.DOUBLESLASH
token.DOUBLESLASHEQUAL
token.AT

However, I can't fault tokenize for deciding to treat all of those tokens the 
same way - for many source code manipulation purposes, these just need to be 
transcribed literally, and the OP token serves that purpose just fine.

As the extensive test updates in the current patch suggest, AMK is also correct 
that changing this away from always returning OP tokens (even for characters 
with more specialised tokens available) would be a backwards incompatible 
change.

I think there are two parts to this problem, one documentation related 
(affecting 2.7, 3.2, 3.3) and another that would be an actual change in 3.3:

1. First, I think 3.3 should add an exact_type attribute to TokenInfo 
instances (without making it part of the tuple-based API). For most tokens, 
this would be the same as type, but for OP tokens, it would provide the 
appropriate more specific token ID.

2. Second, the tokenize module documentation should state *explicitly* which 
tokens it collapses down into the generic OP token, and explain how to use 
the string attribute to recover the more detailed information.

--
assignee:  - docs@python
components: +Documentation
nosy: +docs@python, ncoghlan
stage:  - needs patch
title: function generate_tokens at tokenize.py yields wrong token for colon - 
Add new attribute to TokenInfo to report specific token IDs
versions: +Python 2.7, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

I believe that that list includes all symbols and symbol combinations that are 
syntactically significant in expressions. This is the generalized meaning of 
'operator' that is being used. What do not appear are '#' which marks comments, 
'_' which is a name char, and '\' which escapes chars within strings. Other 
symbols within strings will also not be marked as OP tokens. The non-syntactic 
symbols '$' and '?' are also omitted.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-14 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

Sure, but what does that have to do with anything? tokenize isn't a general 
purpose tokenizer, it's specifically for tokenizing Python source code.

The *problem* is that it doesn't currently fully tokenize everything, but 
doesn't explicitly say that in the module documentation.

Hence my proposed two-fold fix: document the current behaviour explicitly and 
also add a separate exact_type attribute for easy access to the detailed 
tokenization without doing your own string comparisons.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

If you are responding to me, I am baffled. I gave a concise way to document the 
current behavior with respect to .OP, which you said you wanted.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2134] Add new attribute to TokenInfo to report specific token IDs

2011-12-14 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

Ah, I didn't read it as suggested documentation at all - you moved seamlessly 
from personal commentary to a docs suggestion without separating the two, so it 
appeared to be a complete non sequitur to me.

As for the docs suggestion, I think it works as the explanation of which tokens 
are affected once the concept of the token stream simplification is introduced:
=
To simplify token stream handling, all literal tokens (':', '{', etc) are 
returned using the generic 'OP' token type. This allows them to be simply 
handled using common code paths (e.g. for literal transcription directly from 
input to output). Specific tokens can be distinguished by checking the string 
attribute of OP tokens for a match with the expected character sequence.

The affected tokens are all symbols and symbol combinations that are 
syntactically significant in expressions (as listed in the token module). 
Anything which is not an independent token (i.e. '#' which marks comments, '_' 
which is just part of a name, '\' which is used for line continuations, the 
contents of string literals and any symbols which are not a defined part of 
Python's syntax) is completely unaffected by this difference in behaviour.
===

If exact_type is introduced in 3.3, then the first paragraph can be adjusted 
accordingly.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2134
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com