Eryk Sun <eryk...@gmail.com> added the comment:
> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS. The issue is that the line length is limited to BUFSIZ, which ends up splitting the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 8192 bytes in Linux, in which case you need a line that's 16 times longer in order to reproduce the error. For example: $ stat -c "%s" test.py 8194 $ python3.9 test.py SyntaxError: Non-UTF-8 code starting with '\xe2' in file /home/someone/test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR was recently merged into the main branch for 3.10a7+. Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 3.9. ---------- nosy: +eryksun _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38755> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com