Thomas Kluyver added the comment:
I've opened a PR for issue #12486, which would make the existing but
undocumented 'generate_tokens' function public:
https://github.com/python/cpython/pull/6957
I agree that it would be good to design a nicer API for this, but the
Martin Panter added the comment:
I left some comments. Also, it would be nice to use the new function in the
documentation example, which currently suggests tunnelling through UTF-8 but
not adding an encoding comment. And see the patch for Issue 12486, which
highlights a couple of other
Martin Panter added the comment:
Actually maybe Issue 12486 is good enough to fix this too. With the patch
proposed there, tokenize_basestring("source") would just be equivalent to
tokenize(StringIO("source").readline)
--
___
Python tracker
Changes by Serhiy Storchaka storch...@gmail.com:
--
versions: +Python 3.4 -Python 3.2, Python 3.3
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9969
___
Meador Inge mead...@gmail.com added the comment:
Attached is a first cut at a patch.
--
keywords: +patch
stage: needs patch - patch review
Added file: http://bugs.python.org/file23099/issue9969.patch
___
Python tracker rep...@bugs.python.org
Changes by Thomas Kluyver tak...@gmail.com:
--
nosy: +takluyver
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9969
___
___
Python-bugs-list
Abhay Saxena a...@email.com added the comment:
If the goal is tokenize(...) accepting a text I/O readline, we already have the
(undocumented) generate_tokens(readline).
--
nosy: +ark3
___
Python tracker rep...@bugs.python.org
Nick Coghlan ncogh...@gmail.com added the comment:
The idea is bring the API up a level, and also take care of wrapping the
file-like object around the source string/byte sequence.
--
___
Python tracker rep...@bugs.python.org
Nick Coghlan ncogh...@gmail.com added the comment:
As per Antoine's comment on #9873, requiring a real string via
isinstance(source, str) to trigger the string IO version is likely to be
cleaner than attempting to duck-type this. Strings are an area where we make so
many assumptions about the
New submission from Meador Inge mead...@gmail.com:
Currently with 'py3k' only 'bytes' objects are accepted for tokenization:
import io
import tokenize
tokenize.tokenize(io.StringIO(1+1).readline)
Traceback (most recent call last):
File stdin, line 1, in module
File
Changes by Meador Inge mead...@gmail.com:
--
components: +Library (Lib)
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9969
___
___
Changes by Michael Foord mich...@voidspace.org.uk:
--
nosy: +michael.foord
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9969
___
___
Michael Foord mich...@voidspace.org.uk added the comment:
Note from Nick Coghlan from the Python-dev discussion:
A very quick scan of _tokenize suggests it is designed to support
detect_encoding returning None to indicate the line iterator will
return already decoded lines. This is confirmed by
Nick Coghlan ncogh...@gmail.com added the comment:
Possible approach (untested):
def get_tokens(source):
if hasattr(source, encode):
# Already decoded, so bypass encoding detection
return _tokenize(io.StringIO(source).readline, None)
# Otherwise attempt to detect the
STINNER Victor victor.stin...@haypocalc.com added the comment:
See also issue #4626 which introduced PyCF_IGNORE_COOKIE and
PyPARSE_IGNORE_COOKIE flags to support unicode string for the builtin compile()
function.
--
nosy: +haypo
___
Python tracker
15 matches
Mail list logo