[issue19424] _warnings: patch to avoid conversions from/to UTF-8

STINNER Victor Tue, 29 Oct 2013 12:19:36 -0700

STINNER Victor added the comment:

> I don't see a benefit from this patch.


Oh, sorry, I forgot to explain the motivation. Performances of the warnings 
module are not critical module. The motivation here is to avoid to encoding 
string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) 
fails if the filename contains a surrogate character.

>>> warnings.warn_explicit("text", RuntimeError, "filename", 5)
filename:5: RuntimeError: text
>>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 
8: surrogates not allowed


Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few 
instructions later:

PyObject *to_str = PyObject_Str(item);
err_str = _PyUnicode_AsString(to_str);
...
PyErr_Format(PyExc_RuntimeError,  "...%s", err_str);

Using "%R" avoids any encoding conversion.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19424>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19424] _warnings: patch to avoid conversions from/to UTF-8

Reply via email to