[issue43667] Solaris: Fix broken Unicode encoding in non-UTF locales

Jakub Kulik Tue, 30 Mar 2021 03:12:09 -0700


New submission from Jakub Kulik <kulik...@gmail.com>:


On Linux, wchar_t values are mapped to their UTF-8 counterparts; however, that 
does not have to be the case as the standard allows any arbitrary 
representation to be used, and this is the case for Solaris.

In Oracle Solaris, the internal form of wchar_t is specific to a locale; in the 
Unicode locales, wchar_t has the UTF-32 Unicode encoding form, and other 
locales have different representations [1].

This is an issue because Python expects wchar_t to correspond with Unicode, 
which on Oracle Solaris with non-UTF locale results either in errors (values 
are outside the Unicode range) or in output with different symbols.

Unicode locales work as expected, but they are not an acceptable workaround for 
some Oracle Solaris users that cannot use Unicode encoding for various reasons.


Because of that, we fixed it a few months ago with a patch to 
`PyUnicode_FromWideChar`, which handles conversion to unicode (attached in PR). 
It was tested over the last half a year, and we didn't see any related issues 
since.

Is something like this acceptable or should it be fixed on a different place/in 
a different way? All comments are appreciated.

[1] https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html

----------
components: Unicode
messages: 389813
nosy: ezio.melotti, kulikjak, vstinner
priority: normal
severity: normal
status: open
title: Solaris: Fix broken Unicode encoding in non-UTF locales
versions: Python 3.10, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43667>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43667] Solaris: Fix broken Unicode encoding in non-UTF locales

Reply via email to