Nicholas Bastin wrote: > The documentation for Py_UNICODE states the following: > > "This type represents a 16-bit unsigned storage type which is used by > Python internally as basis for holding Unicode ordinals. On platforms > where wchar_t is available and also has 16-bits, Py_UNICODE is a > typedef alias for wchar_t to enhance native platform compatibility. On > all other platforms, Py_UNICODE is a typedef alias for unsigned > short." > > However, we have found this not to be true on at least certain RedHat > versions (maybe all, but I'm not willing to say that at this point). > pyconfig.h on these systems reports that PY_UNICODE_TYPE is wchar_t, > and PY_UNICODE_SIZE is 4. Needless to say, this isn't consistent with > the docs. It also creates quite a few problems when attempting to > interface Python with other libraries which produce unicode data. > > Is this a bug, or is this behaviour intended?
It's a documentation bug. The above was true in Python 2.0 and still is for standard Python builds. The optional 32-bit support was added later on (in Python 2.1 IIRC) and is only used if Python is compiled with --enable-unicode=ucs4. Unfortunately, RedHat and others have made the UCS4 build their default which caused and is still causing lots of problems with Python extensions shipped as binaries, e.g. RPMs or other packages. > It turns out that at some point in the past, this created problems for > tkinter as well, so someone just changed the internal unicode > representation in tkinter to be 4 bytes as well, rather than tracking > down the real source of the problem. > > Is PY_UNICODE_TYPE always going to be guaranteed to be 16 bits, or is > it dependent on your platform? (in which case we can give up now on > Python unicode compatibility with any other libraries). Depends on the way Python was compiled. > At the very > least, if we can't guarantee the internal representation, then the > PyUnicode_FromUnicode API needs to go away, and be replaced with > something capable of transcoding various unicode inputs into the > internal python representation. We have PyUnicode_Decode() for that. PyUnicode_FromUnicode is useful and meant for working directly on Py_UNICODE buffers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 04 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com