Re: [Python-Dev] Bytes path related questions for Guido

Antoine Pitrou Sun, 24 Aug 2014 07:26:35 -0700

Le 24/08/2014 09:04, Nick Coghlan a écrit :

On 24 August 2014 14:44, Nick Coghlan <[email protected]> wrote:

2. Should we add some additional helpers to the string module for
dealing with surrogate escaped bytes and other techniques for
smuggling arbitrary binary data as text?


My proposal [3] is to add:

* string.escaped_surrogates (constant with the 128 escaped code points)
* string.clean(s): replaces surrogates with '\ufffd' or another
specified code point
* string.redecode(s, encoding): encodes a string back to bytes and
then decodes it again using the specified encoding (the old encoding
defaults to 'latin-1' to match the assumptions in WSGI)



Serhiy & Ezio convinced me to scale this one back to a proposal for
"codecs.clean_surrogate_escapes(s)", which replaces surrogates that
may be produced by surrogateescape (that's what string.clean() above
was supposed to be, but my description was not correct, and the name
was too vague for that error to be obvious to the reader)

"clean" conveys the wrong meaning. It should use a scary word such as"trap". "Cleaning" surrogates is unlikely to be the right procedure whendealing with surrogates produced by undecodable byte sequences.


Regards

Antoine.


_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bytes path related questions for Guido

Reply via email to