Le 08/12/2011 10:17, Stefan Krah a écrit :
I'm think that b'\xA0' is a valid thousands separator.
I agree, but it's not the point: the problem is that b'\xA0' is decoded
to a strange U+30000020 character by mbstowcs().
Currently I have this horrible function to deal with the problem:
...
n = mbstowcs(buf, s, 2);
...
tmp = PyUnicode_FromWideChar(buf, n);
if (tmp == NULL) {
return NULL;
}
utf8 = PyUnicode_AsUTF8String(tmp);
Py_DECREF(tmp);
return utf8;
I would not help this specific issue: b'\xA0' is not decodable from UTF-8.
I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems
have this thousands separator.
The problem is not directly in the C localeconv() function, but in
mbstowcs() with the hu_HU locale.
You can try my test program for this issue:
http://bugs.python.org/file23876/localeconv_wchar.c
My test is maybe not correct, because it only sets LC_ALL, which is a
little bit different than Python tests (see below).
--
I don't remember on which buildbot the issue occurred :-(
- "sparc solaris10 gcc 3.x" has "LANG=C" and "TZ=Europe/Berlin"
environement variable
- "x86 OpenIndiana 3.x" and "AMD64 OpenIndian a%203.x" have
"TZ=Europe/London" and no locale variable!?
The issue occurred for example in test_lc_numeric_basic() of
test__locale which sets LC_NUMERIC and LC_CTYPE locales (but not
LC_ALL). LC_ALL and LC_NUMERIC are different in this test, but
LC_NUMERIC and LC_CTYPE are the same.
--
Stefan: would you accept that locale.localeconv() and locale.strxfrm()
stop working (instead of returning invalid data) on Solaris in certains
cases (it looks like the issue depends on the locale and the OS
version)? It can be a motivation to fix the root of the issue ;-)
Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com