On 28 Aug 2014, at 19:54, Glenn Linderman wrote:
On 8/28/2014 10:41 AM, R. David Murray wrote:
On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman
<v+pyt...@g.nevcal.com> wrote:
[...]
Also for
cases where the data stream is *supposed* to be in a given encoding,
but
contains undecodable bytes. Showing the stuff that incorrectly
decodes
as whatever it decodes to is generally what you want in that case.
Sure, people can learn to recognize mojibake for what it is, and maybe
even learn to recognize it for what it was intended to be, in limited
domains. But suppressing/replacing the surrogates doesn't help with
that... would it not be better to replace the surrogates with an
escape sequence that shows the original, undecodable, byte value?
Like \xNN ?
For that we could extend the "backslashreplace" codec error callback, so
that it can be used for decoding too, not just for encoding. I.e.
b"a\xffb".decode("utf-8", "backslashreplace")
would return
"a\\xffb"
Servus,
Walter
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com