Pod added the comment:
Not the OP, but I find this message a bug because it's confusing from the
perspective of a user of the tokenize() function. If you give tokenize a
readlines() that returns a str, you get this error message that confusingly
states that something inside tokenize must be a string and NOT a bytes, even
though the user gave readlines a string, not a bytes. It looks like an internal
bug.
Turns out it's because the contact changed from python2 to 3.
Personally, I'd been accidentally reading the python2 page for the tokenize
library instead of python3, and had been using tokenize.generate_tokens in my
python 3 code which accepts a io.StringIO just fine. When I realising my
mistake and switched to the python3 version of the page I noticed
generate_tokens is no longer supported, even though the code I had was working,
and I noticed that the definition of tokenize had changed to match the old
generate_tokens (along with a subtle change in the definition of the acceptable
readlines function).
So when I switched from tokenize.generate_tokens to tokenize.tokenize to try
and use the library as intended, I get the same error as OP. Perhaps OP made a
similar mistake?
To actually hit the error in question:
$ cat -n temp.py
1 import tokenize
2 import io
3
4
5 byte_reader = io.BytesIO(b"test bytes generate_tokens")
6 tokens = tokenize.generate_tokens(byte_reader.readline)
7
8 byte_reader = io.BytesIO(b"test bytes tokenize")
9 tokens = tokenize.tokenize(byte_reader.readline)
10
11 byte_reader = io.StringIO("test string generate")
12 tokens = tokenize.generate_tokens(byte_reader.readline)
13
14 str_reader = io.StringIO("test string tokenize")
15 tokens = tokenize.tokenize(str_reader.readline)
16
17
$ python3 temp.py
Traceback (most recent call last):
File "temp.py", line 15, in <module>
tokens = tokenize.tokenize(str_reader.readline)
File "C:\work\env\python\Python34_64\Lib\tokenize.py", line 467, in
tokenize
encoding, consumed = detect_encoding(readline)
File "C:\work\env\python\Python34_64\Lib\tokenize.py", line 409, in
detect_encoding
if first.startswith(BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes
----------
nosy: +Pod
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23297>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com