Glenn Linderman writes:
 > On 8/26/2014 4:31 AM, MRAB wrote:
 > > On 2014-08-26 03:11, Stephen J. Turnbull wrote:
 > >> Nick Coghlan writes:

 > > How about:
 > >
 > >     replace_surrogate_escapes(s, replacement='\uFFFD')
 > >
 > > If you want them removed, just pass an empty string as the
 > > replacement.

That seems better to me (I had too much C for breakfast, I think).

 > And further, replacement could be a vector of 128 characters, to do
 > immediate transcoding,

Using what encoding?  If you knew that much, why didn't you use
(write, if necessary) an appropriate codec?  I can't envision this
being useful.

OTOH, I could see using

    replace_surrogate_escapes(s, replacement='�')

in HTML.  (Actually, probably not; if it makes sense to use Unicode
features you're probably using Unicode as the external encoding, so a
character entity is silly.  But there might be contexts with a useful
multicharacter replacements.)

 > or a single character to do wholesale replacement with some
 > gibberish character, or None to remove (or an empty string).

Not None, that means default (which should be the Unicode standard
REPLACEMENT CHARACTER U+FFFD).

Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to