[issue2134] Add new attribute to TokenInfo to report specific token IDs
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 75baef657770 by Meador Inge in branch '2.7': Issue #2134: Clarify token.OP handling rationale in tokenize documentation. http://hg.python.org/cpython/rev/75baef657770 New changeset dfd74d752b0e by Meador Inge in branch '3.2': Issue #2134: Clarify token.OP handling rationale in tokenize documentation. http://hg.python.org/cpython/rev/dfd74d752b0e New changeset f4976fa6e830 by Meador Inge in branch 'default': Issue #2134: Add support for tokenize.TokenInfo.exact_type. http://hg.python.org/cpython/rev/f4976fa6e830 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Meador Inge mead...@gmail.com added the comment: Fixed. Thanks for the reviews everyone. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Changes by Ezio Melotti ezio.melo...@gmail.com: -- stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Meador Inge mead...@gmail.com added the comment: The cmdoption directive should be used with a program directive. Ah, nice. Thanks for the tip Éric. Updated patch attached along with a patch for the 2.7/3.2 doc update attached. -- Added file: http://bugs.python.org/file24088/tokenize-exact-type-v1.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Changes by Meador Inge mead...@gmail.com: Added file: http://bugs.python.org/file24089/tokenize-docs-2.7-3.2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Éric Araujo mer...@netwok.org added the comment: The cmdoption directive should be used with a program directive. See library/trace for an example of how to use it and to see the anchors and index entries it generates. -- nosy: +eric.araujo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Meador Inge mead...@gmail.com added the comment: The proposed documentation text seems too complicated and language expert speaky to me. We should try to link to standard definitions when possible to reduce the text here. For example, I believe the Operators and Delimiters tokens in the Lexical Analysis section of the docs (http://docs.python.org/dev/reference/lexical_analysis.html#operators) are exactly what we are trying to describe when referencing literal tokens and affected tokens. I like Nick's idea to introduce a new attribute for the exact type, while keeping the tuple structure itself backwards compatible. Attached is a patch for 3.3 that updates the docs, adds exact_type, adds new unit tests, and adds a new CLI option for displaying token names using the exact type. An example of the new CLI option is: $ echo '1+2**4' | ./python -m tokenize 1,0-1,1:NUMBER '1' 1,1-1,2:OP '+' 1,2-1,3:NUMBER '2' 1,3-1,5:OP '**' 1,5-1,6:NUMBER '4' 1,6-1,7:NEWLINE'\n' 2,0-2,0:ENDMARKER '' $ echo '1+2**4' | ./python -m tokenize -e 1,0-1,1:NUMBER '1' 1,1-1,2:PLUS '+' 1,2-1,3:NUMBER '2' 1,3-1,5:DOUBLESTAR '**' 1,5-1,6:NUMBER '4' 1,6-1,7:NEWLINE'\n' 2,0-2,0:ENDMARKER '' -- Added file: http://bugs.python.org/file24045/tokenize-exact-type-v0.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Nick Coghlan ncogh...@gmail.com added the comment: Meador's patch looks good to me. The docs change for 2.7 and 3.2 would be similar, just with text like Specific tokens can be distinguished by checking the ``string`` attribute of OP tokens for a match with the expected character sequence. replacing the reference to the new exact_type attribute. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Changes by Meador Inge mead...@gmail.com: -- nosy: +meador.inge ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Changes by Eric Snow ericsnowcurren...@gmail.com: -- nosy: +eric.snow ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Terry J. Reedy tjre...@udel.edu added the comment: Both the proposed text and 3.3 addition look good to me. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Nick Coghlan ncogh...@gmail.com added the comment: There are a *lot* of characters with semantic significance that are reported by the tokenize module as generic OP tokens: token.LPAR token.RPAR token.LSQB token.RSQB token.COLON token.COMMA token.SEMI token.PLUS token.MINUS token.STAR token.SLASH token.VBAR token.AMPER token.LESS token.GREATER token.EQUAL token.DOT token.PERCENT token.BACKQUOTE token.LBRACE token.RBRACE token.EQEQUAL token.NOTEQUAL token.LESSEQUAL token.GREATEREQUAL token.TILDE token.CIRCUMFLEX token.LEFTSHIFT token.RIGHTSHIFT token.DOUBLESTAR token.PLUSEQUAL token.MINEQUAL token.STAREQUAL token.SLASHEQUAL token.PERCENTEQUAL token.AMPEREQUAL token.VBAREQUAL token.CIRCUMFLEXEQUAL token.LEFTSHIFTEQUAL token.RIGHTSHIFTEQUAL token.DOUBLESTAREQUAL¶ token.DOUBLESLASH token.DOUBLESLASHEQUAL token.AT However, I can't fault tokenize for deciding to treat all of those tokens the same way - for many source code manipulation purposes, these just need to be transcribed literally, and the OP token serves that purpose just fine. As the extensive test updates in the current patch suggest, AMK is also correct that changing this away from always returning OP tokens (even for characters with more specialised tokens available) would be a backwards incompatible change. I think there are two parts to this problem, one documentation related (affecting 2.7, 3.2, 3.3) and another that would be an actual change in 3.3: 1. First, I think 3.3 should add an exact_type attribute to TokenInfo instances (without making it part of the tuple-based API). For most tokens, this would be the same as type, but for OP tokens, it would provide the appropriate more specific token ID. 2. Second, the tokenize module documentation should state *explicitly* which tokens it collapses down into the generic OP token, and explain how to use the string attribute to recover the more detailed information. -- assignee: - docs@python components: +Documentation nosy: +docs@python, ncoghlan stage: - needs patch title: function generate_tokens at tokenize.py yields wrong token for colon - Add new attribute to TokenInfo to report specific token IDs versions: +Python 2.7, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Terry J. Reedy tjre...@udel.edu added the comment: I believe that that list includes all symbols and symbol combinations that are syntactically significant in expressions. This is the generalized meaning of 'operator' that is being used. What do not appear are '#' which marks comments, '_' which is a name char, and '\' which escapes chars within strings. Other symbols within strings will also not be marked as OP tokens. The non-syntactic symbols '$' and '?' are also omitted. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Nick Coghlan ncogh...@gmail.com added the comment: Sure, but what does that have to do with anything? tokenize isn't a general purpose tokenizer, it's specifically for tokenizing Python source code. The *problem* is that it doesn't currently fully tokenize everything, but doesn't explicitly say that in the module documentation. Hence my proposed two-fold fix: document the current behaviour explicitly and also add a separate exact_type attribute for easy access to the detailed tokenization without doing your own string comparisons. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Terry J. Reedy tjre...@udel.edu added the comment: If you are responding to me, I am baffled. I gave a concise way to document the current behavior with respect to .OP, which you said you wanted. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2134] Add new attribute to TokenInfo to report specific token IDs
Nick Coghlan ncogh...@gmail.com added the comment: Ah, I didn't read it as suggested documentation at all - you moved seamlessly from personal commentary to a docs suggestion without separating the two, so it appeared to be a complete non sequitur to me. As for the docs suggestion, I think it works as the explanation of which tokens are affected once the concept of the token stream simplification is introduced: = To simplify token stream handling, all literal tokens (':', '{', etc) are returned using the generic 'OP' token type. This allows them to be simply handled using common code paths (e.g. for literal transcription directly from input to output). Specific tokens can be distinguished by checking the string attribute of OP tokens for a match with the expected character sequence. The affected tokens are all symbols and symbol combinations that are syntactically significant in expressions (as listed in the token module). Anything which is not an independent token (i.e. '#' which marks comments, '_' which is just part of a name, '\' which is used for line continuations, the contents of string literals and any symbols which are not a defined part of Python's syntax) is completely unaffected by this difference in behaviour. === If exact_type is introduced in 3.3, then the first paragraph can be adjusted accordingly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2134 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com