[issue19846] Python 3 raises Unicode errors with the C locale

Toshio Kuratomi Tue, 10 Dec 2013 11:29:21 -0800

Toshio Kuratomi added the comment:

Looking at the glib code, this looks like the SO post is closer to the truth.  
The API documentation for g_filename_to_utf8() is over-simplified to the point 
of confusion.  This section of the glib API document is closer to what the code 
is doing: 
https://developer.gnome.org/glib/stable/glib-Character-Set-Conversion.html#file-name-encodings


* When encoding matters, glib and gtk functions will assume that char*'s that 
you pass to them point to strings which are encoded in utf-8.
* When char* are not utf8 you are responsible for converting them to utf8 to be 
used by the glib functions (if encoding matters).
* glib provides g_filename_to_utf8() for the special case of transforming 
filenames into the encoding that glib expects.  (Presumably because glib and 
gtk deal with non-utf8 unicode filenames more often than the equivalent 
environment variables, command line switches, etc).
* Contrary to the API docs for g_filename_to_utf8(), g_filename_to_utf8() will 
simply return a copy of the byte string it was passed unless 
G_FILENAME_ENCODING or G_BROKEN_FILENAMES is set.  If those are set, then the 
value of G_FILENAME_ENCODING might be used to attempt to decode the filename or 
the encoding specified in the user's locale might be used.

@haypo, I'm pretty sure from reading the code for g_get_filename_charsets() 
that you have the conditionals reversed.  What I'm seeing is:

if G_FILENAME_ENCODING:
    charset = the first charset listed in G_FILENAME_ENCODING
    if charset == '@locale':
        charset = charset of user's locale
elif G_BROKEN_FILENAMES:
    charset = charset of user's locale
else:
    charset = 'UTF-8'

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19846>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19846] Python 3 raises Unicode errors with the C locale

Reply via email to