Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes:
> > How about: > > > > replace_surrogate_escapes(s, replacement='\uFFFD') > > > > If you want them removed, just pass an empty string as the > > replacement. That seems better to me (I had too much C for breakfast, I think). > And further, replacement could be a vector of 128 characters, to do > immediate transcoding, Using what encoding? If you knew that much, why didn't you use (write, if necessary) an appropriate codec? I can't envision this being useful. OTOH, I could see using replace_surrogate_escapes(s, replacement='�') in HTML. (Actually, probably not; if it makes sense to use Unicode features you're probably using Unicode as the external encoding, so a character entity is silly. But there might be contexts with a useful multicharacter replacements.) > or a single character to do wholesale replacement with some > gibberish character, or None to remove (or an empty string). Not None, that means default (which should be the Unicode standard REPLACEMENT CHARACTER U+FFFD). Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com