[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-12-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c25635b137cc by Victor Stinner in branch 'default':
Issue #16455: On FreeBSD and Solaris, if the locale is C, the
http://hg.python.org/cpython/rev/c25635b137cc

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-26 Thread Jesús Cea Avión

Jesús Cea Avión added the comment:

Victor, any progress on this?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-26 Thread STINNER Victor

STINNER Victor added the comment:

 Victor, any progress on this?

We have two options, I don't know which one is the best (safer). Does
the terminal handle non-ASCII characters with a C locale on FreeBSD or
Solaris?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-12 Thread STINNER Victor

STINNER Victor added the comment:

Hijacking locale.getpreferredencoding() is maybe dangerous. I attached a
new patch, force_ascii.patch, which uses a different approach: be more
strict than mbstowcs(), force the ASCII encoding when:
 - the LC_CTYPE locale is C
 - nl_langinfo(CODESET) is ASCII or an alias of ASCII
 - mbstowcs() is able to decode non-ASCII characters

2012/11/12 STINNER Victor rep...@bugs.python.org


 STINNER Victor added the comment:

 Some tests are failing with the patch:

 ==
 FAIL: test_undecodable_env (test.test_subprocess.POSIXProcessTestCase)
 --
 Traceback (most recent call last):
   File /usr/home/haypo/prog/python/default/Lib/test/test_subprocess.py,
 line 1606, in test_undecodable_env
 self.assertEqual(stdout.decode('ascii'), ascii(value))
 AssertionError: 'abc\\xff' != 'abc\\udcff'
 - 'abc\xff'
 ?  ^
 + 'abc\udcff'
 ?  ^^^

 ==
 FAIL: test_strcoll_with_diacritic (test.test_locale.TestEnUSCollation)
 --
 Traceback (most recent call last):
   File /usr/home/haypo/prog/python/default/Lib/test/test_locale.py, line
 364, in test_strcoll_with_diacritic
 self.assertLess(locale.strcoll('\xe0', 'b'), 0)
 AssertionError: 126 not less than 0

 ==
 FAIL: test_strxfrm_with_diacritic (test.test_locale.TestEnUSCollation)
 --
 Traceback (most recent call last):
   File /usr/home/haypo/prog/python/default/Lib/test/test_locale.py, line
 367, in test_strxfrm_with_diacritic
 self.assertLess(locale.strxfrm('\xe0'), locale.strxfrm('b'))
 AssertionError: '\xe0' not less than 'b'

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue16455
 ___


--
Added file: http://bugs.python.org/file27970/force_ascii.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___diff -r 6a6ad09faad2 Python/fileutils.c
--- a/Python/fileutils.cMon Nov 12 01:23:51 2012 +0100
+++ b/Python/fileutils.cMon Nov 12 15:33:24 2012 +0100
@@ -4,6 +4,7 @@
 #endif
 
 #ifdef HAVE_LANGINFO_H
+#include locale.h
 #include langinfo.h
 #endif
 
@@ -39,6 +40,104 @@ PyObject *
 
 #ifdef HAVE_STAT
 
+/* Workaround FreeBSD and OpenIndiana locale encoding issue. On these
+   operating systems, nl_langinfo(CODESET) announces an alias of the ASCII
+   encoding, whereas mbstowcs() and wcstombs() functions use the ISO-8859-1
+   encoding. The problem is that os.fsencode() and os.fsdecode() use the
+   Python codec ASCII. For example, if command line arguments are decoded
+   by mbstowcs() and encoded by os.fsencode(), we get a UnicodeEncodeError
+   instead of retrieving the original byte string.
+
+   The workaround is enabled if setlocale(LC_CTYPE, NULL) returns C and
+   nl_langinfo(CODESET) returns ascii. The workaround is not used if
+   setlocale(LC_CTYPE, NULL) failed, or if nl_langinfo() or CODESET is not
+   available.
+
+   Values of locale_is_ascii:
+
+   1: the workaround is used, the ASCII codec is used instead of mbstowcs()
+  and wcstombs() functions
+   0: the workaround is not used
+  -1: unknown, need to call check_locale_force_ascii() to known the value
+*/
+static int locale_force_ascii = -1;
+
+extern char* _Py_GetLocaleEncoding(void);
+
+static int
+check_locale_force_ascii(void)
+{
+#ifdef MS_WINDOWS
+return 0;
+#else
+char *encoding, *loc;
+int i;
+unsigned char ch;
+wchar_t wch;
+size_t res;
+
+return 1;
+
+loc = setlocale(LC_CTYPE, NULL);
+if (loc == NULL || strcmp(loc, C) != 0) {
+/* Failed to get the LC_CTYPE locale or it is different than C:
+ * don't use the workaround. */
+return 0;
+}
+
+encoding = _Py_GetLocaleEncoding();
+if (encoding == NULL) {
+/* unknown encoding: consider that the encoding is not ASCII */
+PyErr_Clear();
+return 0;
+}
+
+if (strcmp(encoding, ascii) != 0) {
+free(encoding);
+return 0;
+}
+free(encoding);
+
+/* the locale is not set and nl_langinfo(CODESET) returns ASCII
+   (or an alias of the ASCII encoding). Check if the locale encoding
+   is really ASCII. */
+for (i=0x80; i0xff; i++) {
+ch = (unsigned char)i;
+res = mbstowcs(wch, (char*)ch, 1);
+if (res == (size_t)-1) {
+/* decoding a non-ASCII character from the locale encoding failed:
+   the encoding is really ASCII */
+return 0;
+}
+}
+return 1;
+#endif

[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-12 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-11 Thread STINNER Victor

New submission from STINNER Victor:

On FreeBSD and OpenIndiana, sys.getfilesystemencoding() is 'ascii' when the 
locale is not set, whereas the locale encoding is ISO-8859-1.

This inconsistency causes different issue. For example, 
os.fsencode(sys.argv[1]) fails if the argument is not ASCII because sys.argv 
are decoded from the locale encoding (by _Py_char2wchar()).

sys.getfilesystemencoding() is 'ascii' because nl_langinfo(CODESET) is used to 
to get the locale encoding and nl_langinfo(CODESET) announces ASCII (or an 
alias of this encoding).

Python should detect this case and set sys.getfilesystemencoding() to 
'iso8859-1' if the locale encoding is 'iso8859-1' whereas nl_langinfo(CODESET) 
announces ASCII. We can for example decode b'\xe9' with mbstowcs() and check if 
it fails or if the result is U+00E9.

--
components: Unicode
messages: 175401
nosy: ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: sys.getfilesystemencoding() is not the locale encoding on FreeBSD and 
OpenSolaris when the locale is not set
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-11 Thread STINNER Victor

STINNER Victor added the comment:

Attached patch works around the CODESET issue on OpenIndiana and FreeBSD. If 
the LC_CTYPE locale is C and nl_langinfo(CODESET) returns ASCII (or an alias 
of this encoding), b\xE9 is decoded from the locale encoding: if the result 
is U+00E9, the patch Python uses ISO-8859-1. (If decoding fails, the locale 
encoding is really ASCII, the workaround is not used.)

If the result is different (b'\xe9' is not decoded from the locale encoding to 
U+00E9), a ValueError is raised. I wrote this test to detect bugs. I hope that 
our buildbots will validate the code. We may choose a different behaviour (ex: 
keep ASCII).

Example on FreeBSD 8.2, original Python 3.4:

$ ./python
 import sys, locale
 sys.getfilesystemencoding()
'ascii'
 locale.getpreferredencoding()
'US-ASCII'

Example on FreeBSD 8.2, patched Python 3.4:

$ ./python 
 import sys, locale
 sys.getfilesystemencoding()
'iso8859-1'
 locale.getpreferredencoding()
'iso8859-1'

--
keywords: +patch
Added file: http://bugs.python.org/file27965/workaround_codeset.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16455] sys.getfilesystemencoding() is not the locale encoding on FreeBSD and OpenSolaris when the locale is not set

2012-11-11 Thread STINNER Victor

STINNER Victor added the comment:

Some tests are failing with the patch:

==
FAIL: test_undecodable_env (test.test_subprocess.POSIXProcessTestCase)
--
Traceback (most recent call last):
  File /usr/home/haypo/prog/python/default/Lib/test/test_subprocess.py, line 
1606, in test_undecodable_env
self.assertEqual(stdout.decode('ascii'), ascii(value))
AssertionError: 'abc\\xff' != 'abc\\udcff'
- 'abc\xff'
?  ^
+ 'abc\udcff'
?  ^^^

==
FAIL: test_strcoll_with_diacritic (test.test_locale.TestEnUSCollation)
--
Traceback (most recent call last):
  File /usr/home/haypo/prog/python/default/Lib/test/test_locale.py, line 364, 
in test_strcoll_with_diacritic
self.assertLess(locale.strcoll('\xe0', 'b'), 0)
AssertionError: 126 not less than 0

==
FAIL: test_strxfrm_with_diacritic (test.test_locale.TestEnUSCollation)
--
Traceback (most recent call last):
  File /usr/home/haypo/prog/python/default/Lib/test/test_locale.py, line 367, 
in test_strxfrm_with_diacritic
self.assertLess(locale.strxfrm('\xe0'), locale.strxfrm('b'))
AssertionError: '\xe0' not less than 'b'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16455
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com