[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Ok, let's continue the discussion on https://bugs.python.org/issue38755 -- ___ Python tracker ___

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread Eryk Sun
Eryk Sun added the comment: > So that means we can close the issue, no? This is a bug in 3.8 and 3.9, which need the fix to keep reading until "\n" is seen on the line. I arrived at this issue via bpo-38755 if you think it should be addressed there, but it's the same bug that's reported

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread STINNER Victor
STINNER Victor added the comment: With https://bugs.python.org/issue14811#msg160706 I get a SyntaxError on Python 3.7, 3.8, 3.9 and 3.10.0a6. But I don't get an error on the master branch (Python 3.10.0a7+). Eryk: > The latest alpha release, 3.10a7, includes your rewrite of the tokenizer,

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > no longer fails in Windows. So that means we can close the issue, no? -- ___ Python tracker ___

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread Eryk Sun
Eryk Sun added the comment: > I don't get any error executing the t33a.py script The second line in t33a.py is 1618 bytes. The standard I/O BUFSIZ in Linux is 8192 bytes, but it's only 512 bytes in Windows. The latest alpha release, 3.10a7, includes your rewrite of the tokenizer, and in

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I don't get any error executing the t33a.py script -- ___ Python tracker ___ ___

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread STINNER Victor
Change by STINNER Victor : -- nosy: +BTaskaya, lys.nikolaou, pablogsal ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError("Non-UTF-8 code starting with...")

2021-04-13 Thread Eryk Sun
Change by Eryk Sun : -- versions: +Python 3.8, Python 3.9 -Python 2.7, Python 3.2, Python 3.3, Python 3.4 ___ Python tracker ___

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError(Non-UTF-8 code starting with...)

2012-11-04 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- stage: - needs patch versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14811 ___

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError(Non-UTF-8 code starting with...)

2012-08-01 Thread STINNER Victor
STINNER Victor added the comment: Are we going to fix this before 3.3? Any objections to Victor's patch? detect_truncate.patch is now raising an error if a line is longer than BUFSIZ, whereas Python supports lines longer than BUFSIZ bytes (it's just that the encoding cookie is ignored if the

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError(Non-UTF-8 code starting with...)

2012-07-19 Thread Hynek Schlawack
Hynek Schlawack h...@ox.cx added the comment: Are we going to fix this before 3.3? Any objections to Victor's patch? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14811 ___

[issue14811] decoding_fgets() truncates long lines and fails with a SyntaxError(Non-UTF-8 code starting with...)

2012-05-16 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: Function decoding_fgets (Parser/tokenizer.c) reads line in buffer of fixed size 8192 (line truncated to size 8191) and then fails because line is cut in the middle of a multibyte UTF-8 character. It looks like BUFSIZ is much smaller