On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: > Martin v. Löwis wrote: >> Nicholas Bastin wrote: >> >>> "This type represents the storage type which is used by Python >>> internally as the basis for holding Unicode ordinals. Extension >>> module >>> developers should make no assumptions about the size of this type on >>> any given platform." >> >> >> But people want to know "Is Python's Unicode 16-bit or 32-bit?" >> So the documentation should explicitly say "it depends". > > On a related note, it would be help if the documentation provided a > little more background on unicode encoding. Specifically, that UCS-2 > is > not the same as UTF-16, even though they're both two bytes wide and > most > of the characters are the same. UTF-16 can encode 4 byte characters, > while UCS-2 can't. A Py_UNICODE is either UCS-2 or UCS-4. It took me
I'm not sure the Python documentation is the place to teach someone about unicode. The ISO 10646 pretty clearly defines UCS-2 as only containing characters in the BMP (plane zero). On the other hand, I don't know why python lets you choose UCS-2 anyhow, since it's almost always not what you want. -- Nick _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com