Le 24/08/2014 09:04, Nick Coghlan a écrit :
On 24 August 2014 14:44, Nick Coghlan <ncogh...@gmail.com> wrote:
2. Should we add some additional helpers to the string module for
dealing with surrogate escaped bytes and other techniques for
smuggling arbitrary binary data as text?

My proposal [3] is to add:

* string.escaped_surrogates (constant with the 128 escaped code points)
* string.clean(s): replaces surrogates with '\ufffd' or another
specified code point
* string.redecode(s, encoding): encodes a string back to bytes and
then decodes it again using the specified encoding (the old encoding
defaults to 'latin-1' to match the assumptions in WSGI)


Serhiy & Ezio convinced me to scale this one back to a proposal for
"codecs.clean_surrogate_escapes(s)", which replaces surrogates that
may be produced by surrogateescape (that's what string.clean() above
was supposed to be, but my description was not correct, and the name
was too vague for that error to be obvious to the reader)

"clean" conveys the wrong meaning. It should use a scary word such as "trap". "Cleaning" surrogates is unlikely to be the right procedure when dealing with surrogates produced by undecodable byte sequences.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to