Re: [Python-Dev] Bytes path related questions for Guido

Walter Dörwald Fri, 29 Aug 2014 03:36:45 -0700

On 28 Aug 2014, at 19:54, Glenn Linderman wrote:

On 8/28/2014 10:41 AM, R. David Murray wrote:
On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman<[email protected]> wrote:
[...]
Also for
cases where the data stream is *supposed* to be in a given encoding,butcontains undecodable bytes. Showing the stuff that incorrectlydecodes
as whatever it decodes to is generally what you want in that case.
Sure, people can learn to recognize mojibake for what it is, and maybeeven learn to recognize it for what it was intended to be, in limiteddomains. But suppressing/replacing the surrogates doesn't help withthat... would it not be better to replace the surrogates with anescape sequence that shows the original, undecodable, byte value?Like \xNN ?

For that we could extend the "backslashreplace" codec error callback, sothat it can be used for decoding too, not just for encoding. I.e.


   b"a\xffb".decode("utf-8", "backslashreplace")

would return

   "a\\xffb"

Servus,
   Walter
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bytes path related questions for Guido

Reply via email to