On 26 Aug 2014 21:34, "MRAB" <pyt...@mrabarnett.plus.com> wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: >> >> Nick Coghlan writes: >> >> > "purge_surrogate_escapes" was the other term that occurred to me. >> >> "purge" suggests removal, not replacement. That may be useful too. >> >> neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD') >> > How about: > > replace_surrogate_escapes(s, replacement='\uFFFD') > > If you want them removed, just pass an empty string as the replacement.
The current proposal on the issue tracker is to instead take advantage of the existing error handlers: def convert_surrogateescape(data, errors='replace'): return data.encode('utf-8', 'surrogateescape').decode('utf-8', errors) That code is short, but semantically dense - it took a few iterations to come up with that version. (Added bonus: once you're alerted to the possibility, it's trivial to write your own version for existing Python 3 versions. The standard name just makes it easier to look up when you come across it in a piece of code, and provides the option of optimising it later if it ever seems worth the extra work) I also filed a separate RFE to make backslashreplace usable on input, since that allows the option of separating the replacement operation from the encoding operation. Cheers, Nick.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com