[issue8622] Add PYTHONFSENCODING environment variable

2010-08-25 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 test_sys is still failing on my system where LC_CTYPE 
 only is set to utf-8

Oh yes, test_sys fails if LC_ALL or LC_CTYPE is a locale using a different 
encoding than ascii (eg. LC_ALL=fr_FR.utf8). Fixed by r84314.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-24 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

r84308 should fix the last problems on Mac OS X, FreeBSD and Solaris.

The last failure on test_sys is on Windows with test_undecodable_code 
(TypeError: Type str doesn't support the buffer API), which is unrelated.

Reopen the issue if you see new failures.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-24 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

test_sys is still failing on my system where LC_CTYPE only is set to utf-8.  
Victor, do you want me to apply the LANG-LC_ALL change to the test?

--
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-22 Thread Florent Xicluna

Florent Xicluna florent.xicl...@gmail.com added the comment:

This is still an issue on some buildbots:
 - since r84224 on OS X (PPC Leopard, x86 Tiger)
 - since r84182 on sparc solaris10 gcc, x86 FreeBSD, x86 FreeBSD 7.2

The issue was fixed in r84201, r84202, r84203 for OS X buildbots only, but 
since r84224 it is failing again.

--
keywords: +buildbot
nosy: +flox
status: closed - open
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-22 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I'm working on a fix for test_sys failure. test_os should not fail anymore.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-22 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

In an up to date checkout of py3k on Gentoo linux with LC_CTYPE=en_US.UTF-8, I 
get a failure in test_sys:

==
FAIL: test_pythonfsencoding (test.test_sys.SysModuleTest)
--
Traceback (most recent call last):
  File /home/rdmurray/python/py3k/Lib/test/test_sys.py, line 605, in 
test_pythonfsencoding
self.check_fsencoding(get_fsencoding(env), 'ascii')
  File /home/rdmurray/python/py3k/Lib/test/test_sys.py, line 573, in 
check_fsencoding
self.assertEqual(fs_encoding, expected)
AssertionError: 'utf-8' != 'ascii'
- utf-8
+ ascii

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-22 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Setting LC_ALL instead of LANG in the test fixes the problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-20 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Le jeudi 19 août 2010 22:40:53, vous avez écrit :
 Just please make sure that on other platforms such as BSD, Solaris,
 AIX, etc. that don't have this special Python support
 the env vars are honored.

I added much more tests on the filesystem encoding:
 - (test_os) FSEncodingTests.test_encodings() tests different encoding values 
and check for some known values
 - (test_sys) SysModuleTest.test_pythonfsencoding() tests Python with C locale 
and check that the FS encoding is ascii, and test that setting 
PYTHONFSENCODING is understood by Python (run python with import sys; 
print(sys.getfilesystemencoding()) and compare the output)

These tests are skipped on Windows and Mac OS X. I also patched the doc 
(what's new / cmdline) to explain that PYTHONFSENCODING is not available 
(ignored) on these OSes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-19 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Oh, I realized that PYTHONFSENCODING is ignored on Windows and Mac OS X. r84201 
and r84202 fix test_sys, and r84203 fixes the documentation and Python usage 
(hide PYTHONFSENCODING variable in Python help on Windows and Mac OS X).

We might allow to override the filesystem encoding on Windows, but I don't 
think that it is a good idea because third party libraries will use anyway the 
mbcs encoding.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 Oh, I realized that PYTHONFSENCODING is ignored on Windows and Mac OS X. 
 r84201 and r84202 fix test_sys, and r84203 fixes the documentation and Python 
 usage (hide PYTHONFSENCODING variable in Python help on Windows and Mac OS X).

This has to be changed: The env var needs to be respected on all
platforms.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-19 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

  Oh, I realized that PYTHONFSENCODING is ignored on Windows and Mac OS X.
  r84201 and r84202 fix test_sys, and r84203 fixes the documentation and
  Python usage (hide PYTHONFSENCODING variable in Python help on Windows
  and Mac OS X).
 
 This has to be changed: The env var needs to be respected on all
 platforms.

I don't think so.

On Mac OS X, you cannot create a file with an invalid utf-8 name. The VFS uses 
utf-8:
http://developer.apple.com/mac/library/qa/qa2001/qa1173.html

Use a different encoding will raise error for the first non-ascii filename.

--

About Windows, Python3 uses the wide character API of Windows, except in some 
functions using third party libraries only providing a bytes API (eg. 
openssl). filenames are stored as unicode, even on removable media like CD-Rom 
or USB keys. I don't get the usecase here. Why would you like to change the 
filesystem encoding on Windows?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 Oh, I realized that PYTHONFSENCODING is ignored on Windows and Mac OS X.
 r84201 and r84202 fix test_sys, and r84203 fixes the documentation and
 Python usage (hide PYTHONFSENCODING variable in Python help on Windows
 and Mac OS X).

 This has to be changed: The env var needs to be respected on all
 platforms.
 
 I don't think so.
 
 On Mac OS X, you cannot create a file with an invalid utf-8 name. The VFS 
 uses 
 utf-8:
 http://developer.apple.com/mac/library/qa/qa2001/qa1173.html
 
 Use a different encoding will raise error for the first non-ascii filename.
 
 --
 
 About Windows, Python3 uses the wide character API of Windows, except in some 
 functions using third party libraries only providing a bytes API (eg. 
 openssl). filenames are stored as unicode, even on removable media like 
 CD-Rom 
 or USB keys. I don't get the usecase here. Why would you like to change the 
 filesystem encoding on Windows?

Ok, point taken.

Just please make sure that on other platforms such as BSD, Solaris,
AIX, etc. that don't have this special Python support
the env vars are honored.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-18 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 Here you have a patch. It adds tests in test_sys.
 
 The tests are skipped on a non-ascii Python executable path because of #8611 
 (see #9425).

Thanks for the patch.

A couple of notes:

 * The command line -h explanation is missing from the patch.

 * The documentation should mention that the env var is only
   read once; subsequent changes to the env var are not seen
   by Python

 * If the codec lookup fails, Python should either issue a warning
   and then ignore the env var (using the get_codeset() API).

 * Unrelated to the env var, but still important: if get_codeset()
   does not return a known codec, Python should issue a warning
   before falling back to the default setting. Otherwise, a
   Python user will never know that there's an issue and this
   make debugging a lot harder.

We should also add a new sys.setfilesystemencoding()
function to make changes possible after Python startup. This
would have to go on a separate ticket, though. Or is there
some concept preventing this ?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-18 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 The command line -h explanation is missing from the patch.

done

 The documentation should mention that the env var is only
 read once; subsequent changes to the env var are not seen
 by Python

I copied the PYTHONIOENCODING doc which doesn't mention that. Does Python 
re-read other environment variables at runtime? Anyway, I changed the doc to:

+   If this is set before running the intepreter, it overrides the encoding used
+   for the filesystem encoding (see :func:`sys.getfilesystemencoding`).

I also changed PYTHONIOENCODING doc. Is it better?

 If the codec lookup fails, Python should either issue a warning

Ok, done. I patched also get_codeset() and get_codec_name() to always set a 
Python error.

 ... and then ignore the env var (using the get_codeset() API).

Good idea, done.

 Unrelated to the env var, but still important: if get_codeset()
 does not return a known codec, Python should issue a warning
 before falling back to the default setting. Otherwise, a
 Python user will never know that there's an issue and this
 make debugging a lot harder.

It does already write a message to stderr, but it doesn't explain why it failed.

I changed initfsencoding() to display two messages on get_codeset() error. 
First explain why get_codeset() failed (with the Python error) and then say 
that we fallback to utf-8.

Full example (PYTHONFSENCODING error and simulated get_codeset() error):
---
PYTHONFSENCODING is not a valid encoding:
LookupError: unknown encoding: xxx
Unable to get the locale encoding:
ValueError: CODESET is not set or empty
Unable to get the filesystem encoding: fallback to utf-8
---

 We should also add a new sys.setfilesystemencoding() ...

No, I plan to REMOVE this function. sys.setfilesystemencoding() is dangerous 
because it introduces a lot of inconsistencies: this function is unable to 
reencode all filenames in all objects (eg. Python is unable to find filenames 
in user objects or 3rd party libraries). Eg. if you change the filesystem from 
utf8 to ascii, it will not be possible to use existing non-ascii (unicode) 
filenames: they will raise UnicodeEncodeError. As sys.setdefaultencoding() in 
Python2, I think that sys.setfilesystemencoding() is the root of evil :-)

At startup, initfsencoding() sets the filesystem encoding using the locale 
encoding. Even for the startup process (with very few objects), it's very hard 
to find all filenames:
 - sys.path
 - sys.meta_path
 - sys.modules
 - sys.executable
 - all code objects
 - and I'm not sure that the list is complete

See #9630 for the details.

To remove sys.setfilesystemencoding(), I already patched PEP 383 tests (r84170) 
and I will open a new issue. But it's maybe better to commit both changes 
(remove the function and PYTHONFSENCODING) at the same time.

--
Added file: http://bugs.python.org/file18564/pythonfsencoding-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-18 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file18562/pythonfsencoding.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-18 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 To remove sys.setfilesystemencoding(), ... I will open a new issue

done, issue #9632

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-18 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 The command line -h explanation is missing from the patch.
 
 done
 
 The documentation should mention that the env var is only
 read once; subsequent changes to the env var are not seen
 by Python
 
 I copied the PYTHONIOENCODING doc which doesn't mention that. Does Python 
 re-read other environment variables at runtime? Anyway, I changed the doc to:
 
 +   If this is set before running the intepreter, it overrides the encoding 
 used
 +   for the filesystem encoding (see :func:`sys.getfilesystemencoding`).
 
 I also changed PYTHONIOENCODING doc. Is it better?

Yes, thanks.

 If the codec lookup fails, Python should either issue a warning
 
 Ok, done. I patched also get_codeset() and get_codec_name() to always set a 
 Python error.
 
 ... and then ignore the env var (using the get_codeset() API).
 
 Good idea, done.
 
 Unrelated to the env var, but still important: if get_codeset()
 does not return a known codec, Python should issue a warning
 before falling back to the default setting. Otherwise, a
 Python user will never know that there's an issue and this
 make debugging a lot harder.
 
 It does already write a message to stderr, but it doesn't explain why it 
 failed.
 
 I changed initfsencoding() to display two messages on get_codeset() error. 
 First explain why get_codeset() failed (with the Python error) and then say 
 that we fallback to utf-8.
 
 Full example (PYTHONFSENCODING error and simulated get_codeset() error):
 ---
 PYTHONFSENCODING is not a valid encoding:
 LookupError: unknown encoding: xxx
 Unable to get the locale encoding:
 ValueError: CODESET is not set or empty
 Unable to get the filesystem encoding: fallback to utf-8
 ---

Looks good !

 We should also add a new sys.setfilesystemencoding() ...
 
 No, I plan to REMOVE this function. sys.setfilesystemencoding() is dangerous 
 because it introduces a lot of inconsistencies: this function is unable to 
 reencode all filenames in all objects (eg. Python is unable to find filenames 
 in user objects or 3rd party libraries). Eg. if you change the filesystem 
 from utf8 to ascii, it will not be possible to use existing non-ascii 
 (unicode) filenames: they will raise UnicodeEncodeError. As 
 sys.setdefaultencoding() in Python2, I think that sys.setfilesystemencoding() 
 is the root of evil :-)

Sorry, I wasn't aware we had such a function (and was looking at the
wrong file so didn't find it).

 At startup, initfsencoding() sets the filesystem encoding using the locale 
 encoding. Even for the startup process (with very few objects), it's very 
 hard to find all filenames:
  - sys.path
  - sys.meta_path
  - sys.modules
  - sys.executable
  - all code objects
  - and I'm not sure that the list is complete
 
 See #9630 for the details.
 
 To remove sys.setfilesystemencoding(), I already patched PEP 383 tests 
 (r84170) and I will open a new issue. But it's maybe better to commit both 
 changes (remove the function and PYTHONFSENCODING) at the same time.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-18 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Commited to 3.2 as r84182.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-17 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Here you have a patch. It adds tests in test_sys.

The tests are skipped on a non-ascii Python executable path because of #8611 
(see #9425).

--
keywords: +patch
nosy: +pitrou
Added file: http://bugs.python.org/file18562/pythonfsencoding.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-08-02 Thread Georg Brandl

Changes by Georg Brandl ge...@python.org:


--
assignee:  - haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-05-05 Thread Marc-Andre Lemburg

New submission from Marc-Andre Lemburg m...@egenix.com:

As discussed on issue8610, we need a way to override the automatic detection of 
the file system encoding - for much the same reasons we also do for the I/O 
encoding: the detection mechanism isn't fail-safe.

We should add a new environment variable with the same functionality as 
PYTHONIOENCODING:

PYTHONFSENCODING: Encoding[:errors] used for file system.

--
components: Interpreter Core
messages: 105030
nosy: haypo, lemburg
priority: normal
severity: normal
status: open
title: Add PYTHONFSENCODING environment variable
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8622] Add PYTHONFSENCODING environment variable

2010-05-05 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8622
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com