Re: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues

Victor Stinner Thu, 08 Dec 2011 04:48:02 -0800

Le 08/12/2011 10:17, Stefan Krah a écrit :

I'm think that b'\xA0' is a valid thousands separator.

I agree, but it's not the point: the problem is that b'\xA0' is decodedto a strange U+30000020 character by mbstowcs().

Currently I have this horrible function to deal with the problem:

...
         n = mbstowcs(buf, s, 2);
...
         tmp = PyUnicode_FromWideChar(buf, n);
         if (tmp == NULL) {
                 return NULL;
         }
         utf8 = PyUnicode_AsUTF8String(tmp);
         Py_DECREF(tmp);
         return utf8;


I would not help this specific issue: b'\xA0' is not decodable from UTF-8.

I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems
have this thousands separator.

The problem is not directly in the C localeconv() function, but inmbstowcs() with the hu_HU locale.


You can try my test program for this issue:
http://bugs.python.org/file23876/localeconv_wchar.c

My test is maybe not correct, because it only sets LC_ALL, which is alittle bit different than Python tests (see below).


--

I don't remember on which buildbot the issue occurred :-(

- "sparc solaris10 gcc 3.x" has "LANG=C" and "TZ=Europe/Berlin"environement variable- "x86 OpenIndiana 3.x" and "AMD64 OpenIndian a%203.x" have"TZ=Europe/London" and no locale variable!?

The issue occurred for example in test_lc_numeric_basic() oftest__locale which sets LC_NUMERIC and LC_CTYPE locales (but notLC_ALL). LC_ALL and LC_NUMERIC are different in this test, butLC_NUMERIC and LC_CTYPE are the same.

--

Stefan: would you accept that locale.localeconv() and locale.strxfrm()stop working (instead of returning invalid data) on Solaris in certainscases (it looks like the issue depends on the locale and the OSversion)? It can be a motivation to fix the root of the issue ;-)


Victor
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris issues

Reply via email to