Nicholas Bastin wrote: > On May 6, 2005, at 3:42 PM, James Y Knight wrote: >>It means all the string operations treat strings as if they were >>UCS-2, but that in actuality, they are UTF-16. Same as the case in the >>windows APIs and Java. That is, all string operations are essentially >>broken, because they're operating on encoded bytes, not characters, >>but claim to be operating on characters. > > > Well, this is a completely separate issue/problem. The internal > representation is UTF-16, and should be stated as such. If the > built-in methods actually don't work with surrogate pairs, then that > should be fixed.
Wait... are you saying a Py_UNICODE array contains either UTF-16 or UTF-32 characters, but never UCS-2? That's a big surprise to me. I may need to change my PyXPCOM patch to fit this new understanding. I tried hard to not care how Python encodes unicode characters, but details like this are important when combining two frameworks with different unicode APIs. Shane _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com