Nick Coghlan added the comment:
I'd wondered about that with respect to rehandle_surrogatepass.
The current implementation looks like it processes *all* surrogates (even valid
surrogate pairs), so "handle_surrogates" might be a suitable name.
If the intent is for it to be "handle_lone_surrogates", I'm not sure the
current implementation achieves that, as a valid surrogate pair will match
re.compile('[\ud800-\uefff]+').
The rest looks OK to me, including the decompose_astrals() and
compose_surrogate_pairs() functions. Regardless of any practical utility, the
latter two seem useful for *educational* purposes when it comes to unicode, by
making it clear how to switch between the single code point and dual code point
representations of the astrals.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com