Marc-Andre Lemburg <m...@egenix.com> added the comment: David Coles wrote: > > David Coles <coles.da...@gmail.com> added the comment: > > After doing some more investigation it appears that Android's wchar_t support > before android-9 is totally broken (see > http://android.git.kernel.org/?p=platform/ndk.git;a=blob_plain;f=docs/STANDALONE-TOOLCHAIN.html;hb=HEAD). > With android-9 you get 4 byte wchar_t and working wide character functions. > > Possibly of more interest for Python is that it's no longer buildable without > wchar_t support. While unicodeobject is pretty good at checking HAVE_WCHAR_H, > a number of modules and even pythonrun.c directly use wchar_t or functions > like PyUnicode_FromWideChar without providing a fallback. Does Python 3 now > require wchar_t or are these bugs? (either option seems sensible).
wchar_t should be fairly portable these days. I think the main problem is that we never assumed sizeof(wchar_t) == 1 to be a possibility. On Windows, wchar_t was 16 bit and the glibc started out with 32 bits. > A few other notes: > HAVE_USABLE_WCHAR_T looks like it was a check for unsigned/>16 bits wchar_t > that would allow them to be directly memcpy'd. The code in unicodeobject.c > seems not to really use this anymore except (it's happy with signed or > unsigned) and it looks like the check is only used for Windows now. Note that HAVE_USABLE_WCHAR_T is only used to check whether Python can use wchar_t as alias for Py_UNICODE. Python's Unicode implementation needs Py_UNICODE to be an unsigned type with either 2 bytes or 4 bytes. If wchar_t does not provide these sizes or is a signed type, Python cannot use it for Py_UNICODE and must instead use "unsigned short". If the configure script does not detect this case, then a patch would be helpful. The other wchar_t C lib functions should still remain usable, though. > To properly support wchar_t of size 1 you would basically implement multibyte > character storage either with UTF-8 or just packing two wchar_t's with > UTF-16. At least in Android the distinction doesn't seem to matter as > Android's internationalziation/localization policy seems to be "use Java". Python should not use wchar_t for Py_UNICODE on such platforms and instead go with "unsigned short". I would assume that the wchar_t C lib routines work based on UTF-8 with sizeof(wchar_t) == 1, so the PyUnicode_*WideChar*() APIs would need to be adjusted to work more or less like the UTF-8 codecs. ---------- nosy: +lemburg _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12010> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com