[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-12-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c838c9b117f1 by Victor Stinner in branch '3.2':
Issue #16416: On Mac OS X, operating system data are now always
http://hg.python.org/cpython/rev/c838c9b117f1

New changeset 26c4748351cb by Victor Stinner in branch '3.3':
(Merge 3.2) Issue #16416: On Mac OS X, operating system data are now always
http://hg.python.org/cpython/rev/26c4748351cb

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-12-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset af6fd3ca6de9 by Victor Stinner in branch '3.2':
Issue #16416: Fix compilation error
http://hg.python.org/cpython/rev/af6fd3ca6de9

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-12-03 Thread STINNER Victor

STINNER Victor added the comment:

The issue should now be fixed in Python 3.2, 3.3 and 3.4.

--
resolution:  - fixed
status: open - closed
versions: +Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-12-02 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Ping.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Victor, could you please backport to 3.3?

--
assignee: ronaldoussoren - haypo
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread STINNER Victor

STINNER Victor added the comment:

macosx-2.patch patches _Py_wchar2char() and _Py_char2wchar() functions to
use UTF-8/surrogateescape for any function using the locale encoding, not
only file related functions of fileutils.h. The patch does also simplify
the code, no more specific #ifdef __APPLE__ in python.c:

-#ifdef __APPLE__
-argv_copy[i] = _Py_DecodeUTF8_surrogateescape(argv[i],
strlen(argv[i]));
-#else
 argv_copy[i] = _Py_char2wchar(argv[i], NULL);
-#endif

2012/11/7 Andrew Svetlov rep...@bugs.python.org


 Changes by Andrew Svetlov andrew.svet...@gmail.com:


 --
 nosy: +asvetlov

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue16416
 ___


--
Added file: http://bugs.python.org/file27969/macosx-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___diff -r 6a6ad09faad2 Modules/python.c
--- a/Modules/python.c  Mon Nov 12 01:23:51 2012 +0100
+++ b/Modules/python.c  Mon Nov 12 14:29:44 2012 +0100
@@ -15,10 +15,6 @@ wmain(int argc, wchar_t **argv)
 }
 #else
 
-#ifdef __APPLE__
-extern wchar_t* _Py_DecodeUTF8_surrogateescape(const char *s, Py_ssize_t size);
-#endif
-
 int
 main(int argc, char **argv)
 {
@@ -45,11 +41,7 @@ main(int argc, char **argv)
 oldloc = strdup(setlocale(LC_ALL, NULL));
 setlocale(LC_ALL, );
 for (i = 0; i  argc; i++) {
-#ifdef __APPLE__
-argv_copy[i] = _Py_DecodeUTF8_surrogateescape(argv[i], 
strlen(argv[i]));
-#else
 argv_copy[i] = _Py_char2wchar(argv[i], NULL);
-#endif
 if (!argv_copy[i]) {
 free(oldloc);
 fprintf(stderr, Fatal Python error: 
diff -r 6a6ad09faad2 Python/fileutils.c
--- a/Python/fileutils.cMon Nov 12 01:23:51 2012 +0100
+++ b/Python/fileutils.cMon Nov 12 14:29:44 2012 +0100
@@ -7,6 +7,10 @@
 #include langinfo.h
 #endif
 
+#ifdef __APPLE__
+extern wchar_t* _Py_DecodeUTF8_surrogateescape(const char *s, Py_ssize_t size);
+#endif
+
 PyObject *
 _Py_device_encoding(int fd)
 {
@@ -59,6 +63,15 @@ PyObject *
 wchar_t*
 _Py_char2wchar(const char* arg, size_t *size)
 {
+#ifdef __APPLE__
+wchar_t *wstr;
+wstr = _Py_DecodeUTF8_surrogateescape(arg, strlen(arg));
+if (wstr == NULL)
+return NULL;
+if (size != NULL)
+*size = wcslen(wstr);
+return wstr;
+#else
 wchar_t *res;
 #ifdef HAVE_BROKEN_MBSTOWCS
 /* Some platforms have a broken implementation of
@@ -144,7 +157,7 @@ wchar_t*
 argsize -= converted;
 out++;
 }
-#else
+#else   /* HAVE_MBRTOWC */
 /* Cannot use C locale for escaping; manually escape as if charset
is ASCII (i.e. escape all bytes  128. This will still roundtrip
correctly in the locale's charset, which must be an ASCII superset. */
@@ -159,7 +172,7 @@ wchar_t*
 else
 *out++ = 0xdc00 + *in++;
 *out = 0;
-#endif
+#endif   /* HAVE_MBRTOWC */
 if (size != NULL)
 *size = out - res;
 return res;
@@ -167,6 +180,7 @@ oom:
 if (size != NULL)
 *size = (size_t)-1;
 return NULL;
+#endif   /* __APPLE__ */
 }
 
 /* Encode a (wide) character string to the locale encoding with the
@@ -183,6 +197,34 @@ oom:
 char*
 _Py_wchar2char(const wchar_t *text, size_t *error_pos)
 {
+#ifdef __APPLE__
+Py_ssize_t len;
+PyObject *unicode, *bytes = NULL;
+char *cpath;
+
+unicode = PyUnicode_FromWideChar(text, wcslen(text));
+if (unicode == NULL) {
+Py_DECREF(unicode);
+return NULL;
+}
+
+bytes = _PyUnicode_AsUTF8String(unicode, surrogateescape);
+Py_DECREF(unicode);
+if (bytes == NULL) {
+PyErr_Clear();
+return NULL;
+}
+
+len = PyBytes_GET_SIZE(bytes);
+cpath = PyMem_Malloc(len+1);
+if (cpath == NULL) {
+Py_DECREF(bytes);
+return NULL;
+}
+memcpy(cpath, PyBytes_AsString(bytes), len + 1);
+Py_DECREF(bytes);
+return cpath;
+#else   /* __APPLE__ */
 const size_t len = wcslen(text);
 char *result = NULL, *bytes = NULL;
 size_t i, size, converted;
@@ -242,6 +284,7 @@ char*
 bytes = result;
 }
 return result;
+#endif   /* __APPLE__ */
 }
 
 /* In principle, this should use HAVE__WSTAT, and _wstat
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 48fbdaf3a849 by Victor Stinner in branch 'default':
Issue #16416: OS data are now always encoded/decoded to/from
http://hg.python.org/cpython/rev/48fbdaf3a849

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset f3e512b5ffb3 by Victor Stinner in branch 'default':
Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions
http://hg.python.org/cpython/rev/f3e512b5ffb3

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1b97cc71a05e by Victor Stinner in branch 'default':
Issue #16416: Fix Misc/NEWS entry, mention Mac OS X
http://hg.python.org/cpython/rev/1b97cc71a05e

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread STINNER Victor

STINNER Victor added the comment:

@Serhiy: Thanks for your review, I missed it before my first commit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Victor, are you going to backport this to 3.3?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-12 Thread STINNER Victor

STINNER Victor added the comment:

 Victor, are you going to backport this to 3.3?

I'm waiting for the result of the buildbots, and maybe also the fix for the 
issue #16455 (which has an impact on tests on undecodable bytes).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-07 Thread Andrew Svetlov

Changes by Andrew Svetlov andrew.svet...@gmail.com:


--
nosy: +asvetlov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-05 Thread STINNER Victor

New submission from STINNER Victor:

Since the changeset 45079ad1e260 (issue #4388), command line arguments are 
decoded from UTF-8 instead of the locale encoding. Functions of 
Python/fileutils.c are still using the locale encoding.

It does not work: see issue #16218. On Mac OS X, in the command line python 
script.py, the filename script.py is decoded from UTF-8 (by 
_Py_DecodeUTF8_surrogateescape) but then it is passed to _Py_fopen() which 
encodes the filename to the locale encoding (ex: ISO-8859-1 if $LANG, $LC_CTYPE 
and $LC_ALL environment variables are not set). The result is mojibake and 
Python fails to open the script.

Attached patch modifies function of Python/fileutils.c to use UTF-8 to encode 
and decode filenames, instead of the locale encoding on Mac OS X.

I don't know yet if Module/getpath.c should also decode paths from UTF-8 
instead of the locale encoding on Mac OS X. We may expose _Py_decode_filename().

--
assignee: ronaldoussoren
components: Macintosh
files: macosx.patch
keywords: patch
messages: 174943
nosy: haypo, ronaldoussoren
priority: normal
severity: normal
status: open
title: Mac OS X: don't use the locale encoding but UTF-8 to encode and decode 
filenames
versions: Python 3.4
Added file: http://bugs.python.org/file27903/macosx.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16416] Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames

2012-11-05 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16416
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com