On 25 August 2014 00:23, Antoine Pitrou <anto...@python.org> wrote: > Le 24/08/2014 09:04, Nick Coghlan a écrit : >> Serhiy & Ezio convinced me to scale this one back to a proposal for >> "codecs.clean_surrogate_escapes(s)", which replaces surrogates that >> may be produced by surrogateescape (that's what string.clean() above >> was supposed to be, but my description was not correct, and the name >> was too vague for that error to be obvious to the reader) > > > "clean" conveys the wrong meaning. It should use a scary word such as > "trap". "Cleaning" surrogates is unlikely to be the right procedure when > dealing with surrogates produced by undecodable byte sequences.
"purge_surrogate_escapes" was the other term that occurred to me. Either way, my use case is to filter them out when I *don't* want to pass them along to other software, but would prefer the Unicode replacement character to the ASCII question mark created by using the "replace" filter when encoding. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com