[issue18814] Add tools for "cleaning" surrogate escaped strings

Ezio Melotti Sun, 24 Aug 2014 00:58:47 -0700

Ezio Melotti added the comment:

I think similar functions should be added in the unicodedata module rather than 
the string module or as str methods.  If I'm not mistaken this was already 
proposed in another issue.
In C we already added macros like IS_{HIGH|LOW|}_SURROGATE and possibly others 
to help dealing with surrogates but AFAIK there's no Python equivalent yet.
As for the specific constants/functions/methods you propose, IMHO the name 
escaped_surrogates is not too clear.  If it's a string of lone surrogates I 
would just call it unicodedata.surrogates (and 
.high_surrogates/.low_surrogates).  These can also be used to build oneliner to 
check if a string contains surrogates and/or to remove them.
clean has a very generic name with no hints about surrogates, and its purpose 
is quite specific.
I'm also not a big fan of redecode.  The equivalent calls to encode/decode are 
not much longer and more explicit.  Also having to redecode often indicates 
that there's a bug before that should be fixed instead (if possible).


----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18814] Add tools for "cleaning" surrogate escaped strings

Reply via email to