Guido van Rossum wrote: > Yes but why? What does this invariant do for him?
I don't know about this person, but there are a few things that don't work properly in UTF-16 mode: - the Unicode character database fails to lookup things. u"\U0001D670".isupper() gives false, but should give true (since it denotes MATHEMATICAL MONOSPACE CAPITAL A). It gives true in UCS-4 mode - As a result, normalization on these doesn't work, either. It should normalize to "LATIN CAPITAL LETTER A" under NFKC, but doesn't. - regular expressions only have limited support. In particular, adding non-BMP characters to character classes is not possible. [\U0001D670] will match any character that is either \uD835 or \uDE70, whereas it only matches MATHEMATICAL MONOSPACE CAPITAL A in UCS-4 mode. There might be more limitations, but those are the ones that come to mind easily. While I could imagine fixing the first two with some effort, the third one is really tricky (unless you would accept a "wide" representation of a character class even if the Unicode representation is only narrow). Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com