Nicholas Bastin wrote: > > On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: >> On a related note, it would be help if the documentation provided a >> little more background on unicode encoding. Specifically, that UCS-2 is >> not the same as UTF-16, even though they're both two bytes wide and most >> of the characters are the same. UTF-16 can encode 4 byte characters, >> while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me > > > I'm not sure the Python documentation is the place to teach someone > about unicode. The ISO 10646 pretty clearly defines UCS-2 as only > containing characters in the BMP (plane zero). On the other hand, I > don't know why python lets you choose UCS-2 anyhow, since it's almost > always not what you want.
Then something in the Python docs ought to say why UCS-2 is not what you want. I still don't know; I've heard differing opinions on the subject. Some say you'll never need more than what UCS-2 provides. Is that incorrect? More generally, how should a non-unicode-expert writing Python extension code find out the minimum they need to know about unicode to use the Python unicode API? The API reference [1] ought to at least have a list of background links. I had to hunt everywhere. .. [1] http://docs.python.org/api/unicodeObjects.html Shane _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com