Eryk Sun <[email protected]> added the comment:
> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS.
The issue is that the line length is limited to BUFSIZ, which ends up splitting
the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's
8192 bytes in Linux, in which case you need a line that's 16 times longer in
order to reproduce the error. For example:
$ stat -c "%s" test.py
8194
$ python3.9 test.py
SyntaxError: Non-UTF-8 code starting with '\xe2' in file
/home/someone/test.py on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR
was recently merged into the main branch for 3.10a7+.
Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and
3.9.
----------
nosy: +eryksun
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue38755>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com