[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Roundup Robot added the comment: New changeset c838c9b117f1 by Victor Stinner in branch '3.2': Issue #16416: On Mac OS X, operating system data are now always http://hg.python.org/cpython/rev/c838c9b117f1 New changeset 26c4748351cb by Victor Stinner in branch '3.3': (Merge 3.2) Issue #16416: On Mac OS X, operating system data are now always http://hg.python.org/cpython/rev/26c4748351cb -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Roundup Robot added the comment: New changeset af6fd3ca6de9 by Victor Stinner in branch '3.2': Issue #16416: Fix compilation error http://hg.python.org/cpython/rev/af6fd3ca6de9 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
STINNER Victor added the comment: The issue should now be fixed in Python 3.2, 3.3 and 3.4. -- resolution: - fixed status: open - closed versions: +Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Antoine Pitrou added the comment: Ping. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Antoine Pitrou added the comment: Victor, could you please backport to 3.3? -- assignee: ronaldoussoren - haypo nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
STINNER Victor added the comment: macosx-2.patch patches _Py_wchar2char() and _Py_char2wchar() functions to use UTF-8/surrogateescape for any function using the locale encoding, not only file related functions of fileutils.h. The patch does also simplify the code, no more specific #ifdef __APPLE__ in python.c: -#ifdef __APPLE__ -argv_copy[i] = _Py_DecodeUTF8_surrogateescape(argv[i], strlen(argv[i])); -#else argv_copy[i] = _Py_char2wchar(argv[i], NULL); -#endif 2012/11/7 Andrew Svetlov rep...@bugs.python.org Changes by Andrew Svetlov andrew.svet...@gmail.com: -- nosy: +asvetlov ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ -- Added file: http://bugs.python.org/file27969/macosx-2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___diff -r 6a6ad09faad2 Modules/python.c --- a/Modules/python.c Mon Nov 12 01:23:51 2012 +0100 +++ b/Modules/python.c Mon Nov 12 14:29:44 2012 +0100 @@ -15,10 +15,6 @@ wmain(int argc, wchar_t **argv) } #else -#ifdef __APPLE__ -extern wchar_t* _Py_DecodeUTF8_surrogateescape(const char *s, Py_ssize_t size); -#endif - int main(int argc, char **argv) { @@ -45,11 +41,7 @@ main(int argc, char **argv) oldloc = strdup(setlocale(LC_ALL, NULL)); setlocale(LC_ALL, ); for (i = 0; i argc; i++) { -#ifdef __APPLE__ -argv_copy[i] = _Py_DecodeUTF8_surrogateescape(argv[i], strlen(argv[i])); -#else argv_copy[i] = _Py_char2wchar(argv[i], NULL); -#endif if (!argv_copy[i]) { free(oldloc); fprintf(stderr, Fatal Python error: diff -r 6a6ad09faad2 Python/fileutils.c --- a/Python/fileutils.cMon Nov 12 01:23:51 2012 +0100 +++ b/Python/fileutils.cMon Nov 12 14:29:44 2012 +0100 @@ -7,6 +7,10 @@ #include langinfo.h #endif +#ifdef __APPLE__ +extern wchar_t* _Py_DecodeUTF8_surrogateescape(const char *s, Py_ssize_t size); +#endif + PyObject * _Py_device_encoding(int fd) { @@ -59,6 +63,15 @@ PyObject * wchar_t* _Py_char2wchar(const char* arg, size_t *size) { +#ifdef __APPLE__ +wchar_t *wstr; +wstr = _Py_DecodeUTF8_surrogateescape(arg, strlen(arg)); +if (wstr == NULL) +return NULL; +if (size != NULL) +*size = wcslen(wstr); +return wstr; +#else wchar_t *res; #ifdef HAVE_BROKEN_MBSTOWCS /* Some platforms have a broken implementation of @@ -144,7 +157,7 @@ wchar_t* argsize -= converted; out++; } -#else +#else /* HAVE_MBRTOWC */ /* Cannot use C locale for escaping; manually escape as if charset is ASCII (i.e. escape all bytes 128. This will still roundtrip correctly in the locale's charset, which must be an ASCII superset. */ @@ -159,7 +172,7 @@ wchar_t* else *out++ = 0xdc00 + *in++; *out = 0; -#endif +#endif /* HAVE_MBRTOWC */ if (size != NULL) *size = out - res; return res; @@ -167,6 +180,7 @@ oom: if (size != NULL) *size = (size_t)-1; return NULL; +#endif /* __APPLE__ */ } /* Encode a (wide) character string to the locale encoding with the @@ -183,6 +197,34 @@ oom: char* _Py_wchar2char(const wchar_t *text, size_t *error_pos) { +#ifdef __APPLE__ +Py_ssize_t len; +PyObject *unicode, *bytes = NULL; +char *cpath; + +unicode = PyUnicode_FromWideChar(text, wcslen(text)); +if (unicode == NULL) { +Py_DECREF(unicode); +return NULL; +} + +bytes = _PyUnicode_AsUTF8String(unicode, surrogateescape); +Py_DECREF(unicode); +if (bytes == NULL) { +PyErr_Clear(); +return NULL; +} + +len = PyBytes_GET_SIZE(bytes); +cpath = PyMem_Malloc(len+1); +if (cpath == NULL) { +Py_DECREF(bytes); +return NULL; +} +memcpy(cpath, PyBytes_AsString(bytes), len + 1); +Py_DECREF(bytes); +return cpath; +#else /* __APPLE__ */ const size_t len = wcslen(text); char *result = NULL, *bytes = NULL; size_t i, size, converted; @@ -242,6 +284,7 @@ char* bytes = result; } return result; +#endif /* __APPLE__ */ } /* In principle, this should use HAVE__WSTAT, and _wstat ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Roundup Robot added the comment: New changeset 48fbdaf3a849 by Victor Stinner in branch 'default': Issue #16416: OS data are now always encoded/decoded to/from http://hg.python.org/cpython/rev/48fbdaf3a849 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Roundup Robot added the comment: New changeset f3e512b5ffb3 by Victor Stinner in branch 'default': Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions http://hg.python.org/cpython/rev/f3e512b5ffb3 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Roundup Robot added the comment: New changeset 1b97cc71a05e by Victor Stinner in branch 'default': Issue #16416: Fix Misc/NEWS entry, mention Mac OS X http://hg.python.org/cpython/rev/1b97cc71a05e -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
STINNER Victor added the comment: @Serhiy: Thanks for your review, I missed it before my first commit. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Serhiy Storchaka added the comment: Victor, are you going to backport this to 3.3? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
STINNER Victor added the comment: Victor, are you going to backport this to 3.3? I'm waiting for the result of the buildbots, and maybe also the fix for the issue #16455 (which has an impact on tests on undecodable bytes). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Changes by Andrew Svetlov andrew.svet...@gmail.com: -- nosy: +asvetlov ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
New submission from STINNER Victor: Since the changeset 45079ad1e260 (issue #4388), command line arguments are decoded from UTF-8 instead of the locale encoding. Functions of Python/fileutils.c are still using the locale encoding. It does not work: see issue #16218. On Mac OS X, in the command line python script.py, the filename script.py is decoded from UTF-8 (by _Py_DecodeUTF8_surrogateescape) but then it is passed to _Py_fopen() which encodes the filename to the locale encoding (ex: ISO-8859-1 if $LANG, $LC_CTYPE and $LC_ALL environment variables are not set). The result is mojibake and Python fails to open the script. Attached patch modifies function of Python/fileutils.c to use UTF-8 to encode and decode filenames, instead of the locale encoding on Mac OS X. I don't know yet if Module/getpath.c should also decode paths from UTF-8 instead of the locale encoding on Mac OS X. We may expose _Py_decode_filename(). -- assignee: ronaldoussoren components: Macintosh files: macosx.patch keywords: patch messages: 174943 nosy: haypo, ronaldoussoren priority: normal severity: normal status: open title: Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames versions: Python 3.4 Added file: http://bugs.python.org/file27903/macosx.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Changes by STINNER Victor victor.stin...@gmail.com: -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com