[issue34979] Python throws “SyntaxError: Non-UTF-8 code start with \xe8...” when parse source file

2018-10-14 Thread Lu jaymin
Lu jaymin added the comment: Thanks for your suggestions. I will make a PR on github. The buffer is resizeable now, please see cpython/Parser/tokenizer.c#L1043 <https://github.com/python/cpython/blob/master/Parser/tokenizer.c#L1043> for d

[issue34979] Python throws “SyntaxError: Non-UTF-8 code start with \xe8...” when parse source file

2018-10-14 Thread Lu jaymin
Lu jaymin added the comment: I think these two issue is the same issue, and the following is a patch write by me, hope this patch will help. ``` diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c index 1af27bf..ba6fb3a 100644 --- a/Parser/tokenizer.c +++ b/Parser/tokenizer.c @@ -617,32

[issue34979] Python throws “SyntaxError: Non-UTF-8 code start with \xe8...” when parse source file

2018-10-14 Thread Lu jaymin
Lu jaymin added the comment: If you declare the encoding at the top of the file, then everything is fine, because in this case Python will use `io.open` to open the file and use `stream.readline` to read one line of code, please see function `fp_setreadl` in `cpython/Parser/tokenizer.c` for

[issue34979] Python throws “SyntaxError: Non-UTF-8 code start with \xe8...” when parse source file

2018-10-13 Thread Lu jaymin
New submission from Lu jaymin : ``` # demo.py s = '测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测