Victor Stinner <victor.stin...@haypocalc.com> wrote: > For localeconv(), it is the b'\xA0' byte string decoded from an encoding > looking like ISO-8859-?? (b'\xA0' is not decodable from UTF-8). It looks like > a bug in the decoder. It also looks like OpenIndiana doesn't use ISO-8859 > locale anymore, only UTF-8 locales (which is much better!). I'm unable to > reproduce the issue on my OpenIndiana VM.
I'm think that b'\xA0' is a valid thousands separator. The 'fi_FI' locale also uses that. Decimal.__format__() has to handle the 'n' specifier, which takes the thousands separator directly from localeconv(). Currently I have this horrible function to deal with the problem: /* Convert decimal_point or thousands_sep, which may be multibyte or in the range [128, 255], to a UTF8 string. */ static PyObject * dotsep_as_utf8(const char *s) { PyObject *utf8; PyObject *tmp; wchar_t buf[2]; size_t n; n = mbstowcs(buf, s, 2); if (n != 1) { /* Issue #7442 */ PyErr_SetString(PyExc_ValueError, "invalid decimal point or unsupported " "combination of LC_CTYPE and LC_NUMERIC"); return NULL; } tmp = PyUnicode_FromWideChar(buf, n); if (tmp == NULL) { return NULL; } utf8 = PyUnicode_AsUTF8String(tmp); Py_DECREF(tmp); return utf8; } The main issue is that there is no portable function mbst_to_utf8() that uses the current locale. If possible, it would be great to have such a thing in the C-API. I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems have this thousands separator. Stefan Krah _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com