Thomas Wouters <[email protected]> added the comment:
Py_CompileString() in Python 3.9 and later, using the PEG parser, appears to no
longer honours source encoding cookies. A reduced test case:
#include "Python.h"
#include <stdio.h>
const char *src = (
"# -*- coding: Latin-1 -*-\n"
"'''\xc3'''\n");
int main(int argc, char **argv)
{
Py_Initialize();
PyObject *res = Py_CompileString(src, "some_path", Py_file_input);
if (res) {
fprintf(stderr, "Compile succeeded.\n");
return 0;
} else {
fprintf(stderr, "Compile failed.\n");
PyErr_Print();
return 1;
}
}
Compiling and running the resulting binary with Python 3.8 (or earlier):
% ./encoding_bug
Compile succeeded.
With 3.9 and PYTHONOLDPARSER=1:
% PYTHONOLDPARSER=1 ./encoding_bug
Compile succeeded.
With 3.9 (without the env var) or 3.10:
% ./encoding_bug
Compile failed.
File "some_path", line 2
'''�'''
^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc3 in
position 0: unexpected end of data
Writing the same bytes to a file and making python3.9 or python3.10 import them
works fine, as does passing the bytes to compile():
Python 3.10.0+ (heads/3.10-dirty:7bac598819, Nov 16 2021, 20:35:12) [GCC
8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('encoding_bug.py', 'rb').read()
>>> b
b"# -*- coding: Latin-1 -*-\n'''\xc3'''\n"
>>> import encoding_bug
>>> encoding_bug.__doc__
'Ã'
>>> co = compile(b, 'some_path', 'exec')
>>> co
<code object <module> at 0x7f447e1b0c90, file "some_path", line 1>
>>> co.co_consts[0]
'Ã'
It's just Py_CompileString() that fails. I don't understand why, and I do
believe it's a regression.
----------
nosy: +gregory.p.smith
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45822>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com