Am 27.12.2010 21:12, schrieb victor.stinner: > Author: victor.stinner > Date: Mon Dec 27 21:12:13 2010 > New Revision: 87518 > > Log: > Issue #10778: decoding_fgets() decodes the filename from the filesystem > encoding instead of UTF-8. > > > Modified: > python/branches/py3k/Parser/tokenizer.c > > Modified: python/branches/py3k/Parser/tokenizer.c > ============================================================================== > --- python/branches/py3k/Parser/tokenizer.c (original) > +++ python/branches/py3k/Parser/tokenizer.c Mon Dec 27 21:12:13 2010 > @@ -545,6 +545,7 @@ > { > char *line = NULL; > int badchar = 0; > + PyObject *filename; > for (;;) { > if (tok->decoding_state == STATE_NORMAL) { > /* We already have a codec associated with > @@ -585,12 +586,16 @@ > if (badchar) { > /* Need to add 1 to the line number, since this line > has not been counted, yet. */ > - PyErr_Format(PyExc_SyntaxError, > - "Non-UTF-8 code starting with '\\x%.2x' " > - "in file %.200s on line %i, " > - "but no encoding declared; " > - "see http://python.org/dev/peps/pep-0263/ for details", > - badchar, tok->filename, tok->lineno + 1); > + filename = PyUnicode_DecodeFSDefault(tok->filename); > + if (filename != NULL) { > + PyErr_Format(PyExc_SyntaxError, > + "Non-UTF-8 code starting with '\\x%.2x' " > + "in file %.200U on line %i, " > + "but no encoding declared; " > + "see http://python.org/dev/peps/pep-0263/ for details", > + badchar, filename, tok->lineno + 1); > + Py_DECREF(filename); > + }
Hmm, and in case decoding fails, we return a Unicode error (without context) instead of a syntax error? Doesn't seem like a good trade-off when the file name is just displayed in a message. Georg _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com