Marc-Andre Lemburg <m...@egenix.com> added the comment:

David Coles wrote:
> 
> David Coles <coles.da...@gmail.com> added the comment:
> 
> After doing some more investigation it appears that Android's wchar_t support 
> before android-9 is totally broken (see 
> http://android.git.kernel.org/?p=platform/ndk.git;a=blob_plain;f=docs/STANDALONE-TOOLCHAIN.html;hb=HEAD).
>  With android-9 you get 4 byte wchar_t and working wide character functions.
>
> Possibly of more interest for Python is that it's no longer buildable without 
> wchar_t support. While unicodeobject is pretty good at checking HAVE_WCHAR_H, 
> a number of modules and even pythonrun.c directly use wchar_t or functions 
> like PyUnicode_FromWideChar without providing a fallback. Does Python 3 now 
> require wchar_t or are these bugs? (either option seems sensible).

wchar_t should be fairly portable these days. I think the main
problem is that we never assumed sizeof(wchar_t) == 1 to be a
possibility. On Windows, wchar_t was 16 bit and the glibc started
out with 32 bits.

> A few other notes:
> HAVE_USABLE_WCHAR_T looks like it was a check for unsigned/>16 bits wchar_t 
> that would allow them to be directly memcpy'd. The code in unicodeobject.c 
> seems not to really use this anymore except (it's happy with signed or 
> unsigned) and it looks like the check is only used for Windows now.

Note that HAVE_USABLE_WCHAR_T is only used to check whether
Python can use wchar_t as alias for Py_UNICODE. Python's Unicode
implementation needs Py_UNICODE to be an unsigned type with
either 2 bytes or 4 bytes. If wchar_t does not provide these
sizes or is a signed type, Python cannot use it for Py_UNICODE
and must instead use "unsigned short".

If the configure script does not detect this case, then a patch
would be helpful.

The other wchar_t C lib functions should still remain usable,
though.

> To properly support wchar_t of size 1 you would basically implement multibyte 
> character storage either with UTF-8 or just packing two wchar_t's with 
> UTF-16. At least in Android the distinction doesn't seem to matter as 
> Android's internationalziation/localization policy seems to be "use Java".

Python should not use wchar_t for Py_UNICODE on such platforms
and instead go with "unsigned short".

I would assume that the wchar_t C lib routines work based on UTF-8
with sizeof(wchar_t) == 1, so the PyUnicode_*WideChar*() APIs would
need to be adjusted to work more or less like the UTF-8 codecs.

----------
nosy: +lemburg

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12010>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to