[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2021-04-13 Thread STINNER Victor


STINNER Victor  added the comment:

In 2012, I wrote detect_truncate.patch in bpo-14811. Does someone want to 
convert it to a PR for Python 3.9?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2021-04-13 Thread STINNER Victor


STINNER Victor  added the comment:

The bpo-14811 issue was fixed in Python 3.10 by bpo-25643, but is not fixed in 
Python 3.8 and 3.9.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2021-04-13 Thread Eryk Sun


Change by Eryk Sun :


--
stage: test needed -> needs patch
versions:  -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2021-04-13 Thread Eryk Sun


Eryk Sun  added the comment:

> P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS.

The issue is that the line length is limited to BUFSIZ, which ends up splitting 
the UTF-8 sequence b'\xe2\x96\x91'. BUFSIZ is only 512 bytes in Windows. It's 
8192 bytes in Linux, in which case you need a line that's 16 times longer in 
order to reproduce the error. For example:

$ stat -c "%s" test.py 
8194
$ python3.9 test.py
SyntaxError: Non-UTF-8 code starting with '\xe2' in file 
/home/someone/test.py on line 1, but no encoding declared; see 
http://python.org/dev/peps/pep-0263/ for details

This has been fixed in a rewrite of the tokenizer (bpo-25643), for which the PR 
was recently merged into the main branch for 3.10a7+.

Maybe a minimal backport to keep reading up to "\n" can be applied to 3.8 and 
3.9.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2021-04-13 Thread Andrew Ushakov


Andrew Ushakov  added the comment:

Just tested again:

D:\Downloads>py 
  
Python 3.9.4 (tags/v3.9.4:1f2e308, Apr  4 2021, 13:27:16) [MSC v.1928 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or"license" for more information.   
 
>>> quit()



  D:\Downloads>py 
tst112.py   
  
SyntaxError: Non-UTF-8 code starting with '\xe2' in file D:\Downloads\tst112.py 
on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ 
for details 

P.S. No problems with Python 3.8.5 and Ubuntu 20.04.2 LTS.

--
versions: +Python 3.7, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2019-11-15 Thread Andrew Ushakov


Andrew Ushakov  added the comment:

> On Windows, with 3.7, 3.8.0, and master, neither the posted comment, the one 
> in the file, not the initial statement in #34979 give the SyntaxError.

Just tried again on my corporate laptop with the downloaded file from this site:

Microsoft Windows [Version 10.0.16299.1451]
(c) 2017 Microsoft Corporation. All rights reserved.

D:\Downloads>py
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

D:\Downloads>py tst112.py
  File "tst112.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, 
but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

d:\Downloads>py -3.7
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

d:\Downloads>py -3.7 tst112.py
  File "tst112.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, 
but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2019-11-15 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

I think that this should be closed as a duplicate of #34979 and this example 
posted there, with the OS and python version included.

On Windows, with 3.7, 3.8.0, and master, neither the posted comment, the one in 
the file, not the initial statement in #34979 give the SyntaxError.

--
nosy: +terry.reedy
stage:  -> test needed
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2019-11-09 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

2019-11-09 Thread Andrew Ushakov

New submission from Andrew Ushakov :

Not very long unicode comment #, space and then 170 or more repetitions of the 
utf8 symbol ░ (b'\xe2\x96\x91'.decode()) 

# 
░░

causes syntax error:

SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, 
but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Python file is attached. Second example is similar, but here unicode string 
with similar length is used as an argument of a print function.

print('\n')

Similar Issue34979 was submitted one year ago...

--
components: Interpreter Core
files: tst112.py
messages: 356298
nosy: Andrew Ushakov
priority: normal
severity: normal
status: open
title: Long unicode string causes SyntaxError: Non-UTF-8 code starting with 
'\xe2' in file ..., but no encoding declared; see 
http://python.org/dev/peps/pep-0263/ for details
versions: Python 3.8
Added file: https://bugs.python.org/file48703/tst112.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com