Ezio Melotti added the comment:
I think similar functions should be added in the unicodedata module rather than
the string module or as str methods. If I'm not mistaken this was already
proposed in another issue.
In C we already added macros like IS_{HIGH|LOW|}_SURROGATE and possibly others
to help dealing with surrogates but AFAIK there's no Python equivalent yet.
As for the specific constants/functions/methods you propose, IMHO the name
escaped_surrogates is not too clear. If it's a string of lone surrogates I
would just call it unicodedata.surrogates (and
.high_surrogates/.low_surrogates). These can also be used to build oneliner to
check if a string contains surrogates and/or to remove them.
clean has a very generic name with no hints about surrogates, and its purpose
is quite specific.
I'm also not a big fan of redecode. The equivalent calls to encode/decode are
not much longer and more explicit. Also having to redecode often indicates
that there's a bug before that should be fixed instead (if possible).
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com