[Python-Dev] Inconsistencies if locale and filesystem encodings are different

2010-10-07 Thread Victor Stinner
Hi,

A PYTHONFSENCODING environment variable was added to Python 3.2: issue #8622. 
This variable introduces an inconstency because the filesystem and the locale 
encodings can now be different.

There are (at least) four issues related to this problem. We have 2 choices to 
fix these issues:

 (a) use the same encoding to encode and decode values (it can be different 
for each issue)

 (b) remove PYTHONFSENCODING variable and raise an error if locale and 
filesystem encodings are different (ensure that both encodings are the same)

Even if choice (a) is not easy to implement, it is feasible and I already 
wrote some patches.

I don't understand how Python interact with other programs who ignore the 
PYTHONFSENCODING environment variable. It's like Python uses its own locale.

Choice (b) looks easy to implement, but... there is the problem of Mac OS X. 
Mac OS X uses utf-8 encoding for the filesystem (and not the locale encoding), 
whereas it looks like the locale encoding is used for the command line 
arguments. See issue #4388 for more information.

There is also maybe an useful usecase of the PYTHONFSENCODING, but I don't 
remember which one :-)


Issues
--

sys.argv:
 - decoded from the locale encoding
 - subprocess encodes process arguments to the filesystem encoding
= issue #9992

sys.path:
 - decoded from the locale encoding
 - import encodes paths to the filesystem encoding
= issue #10014

The script name, read on the command line (eg. python script.py), is decoded 
using the locale encoding, whereas it is used to fill sys.path[0] (without any 
encoding conversion) and import encodes paths to the filesystem encoding.
= issue #10039

PYTHONWARNINGS environment variable:
 - decoded from the locale encoding
 - subprocess encodes environment variables to the filesystem encoding
= issue #9988

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesystem encodings are different

2010-10-07 Thread M.-A. Lemburg
Victor Stinner wrote:
 Hi,
 
 A PYTHONFSENCODING environment variable was added to Python 3.2: issue #8622. 
 This variable introduces an inconstency because the filesystem and the locale 
 encodings can now be different.
 
 There are (at least) four issues related to this problem. We have 2 choices 
 to 
 fix these issues:
 
  (a) use the same encoding to encode and decode values (it can be different 
 for each issue)
 
  (b) remove PYTHONFSENCODING variable and raise an error if locale and 
 filesystem encodings are different (ensure that both encodings are the same)
 
 Even if choice (a) is not easy to implement, it is feasible and I already 
 wrote some patches.
 
 I don't understand how Python interact with other programs who ignore the 
 PYTHONFSENCODING environment variable. It's like Python uses its own locale.
 
 Choice (b) looks easy to implement, but... there is the problem of Mac OS X. 
 Mac OS X uses utf-8 encoding for the filesystem (and not the locale 
 encoding), 
 whereas it looks like the locale encoding is used for the command line 
 arguments. See issue #4388 for more information.
 
 There is also maybe an useful usecase of the PYTHONFSENCODING, but I don't 
 remember which one :-)

You have to differentiate between the meaning of a file system
encoding and the locale:

A file system encoding defines how the applications interact
with the file system.

A locale defines how the user expects to interact with the
application.

It is well possible that the two are different. Mac OS X is
just one example. Another common example is having a Unix
account using the C locale (=ASCII) while working on a UTF-8
file system.

BTW: We added that because just like I/O encoding, you need to be
able to override the setting determined by Python via locale
introspection, which may be wrong. The env var is only meant
as a way to solve encoding problems in special situations where
the local cannot be used to determine the file system or
input/output encoding.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 07 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesystem encodings are different

2010-10-07 Thread Oleg Broytman
On Thu, Oct 07, 2010 at 06:35:09PM +0200, M.-A. Lemburg wrote:
 It is well possible that the two are different. Mac OS X is
 just one example. Another common example is having a Unix
 account using the C locale (=ASCII) while working on a UTF-8
 file system.

   My filesystems are always koi8-r, but sometimes I work with programs in
utf-8 locale. Just an example...

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistencies if locale and filesystem?encodings are different

2010-10-07 Thread Oleg Broytman
On Thu, Oct 07, 2010 at 09:12:13PM +0200, Victor Stinner wrote:
 Le jeudi 07 octobre 2010 18:44:19, Oleg Broytman a ?crit :
 My filesystems are always koi8-r, but sometimes I work with programs in
  utf-8 locale. Just an example...
 
 Are programs able to display correctly non-ascii filenames if your locale 
 encoding is different than your filesystem encoding?

   Most of them don't because - you are right - most programs assume fs
encoding to be the same as stdio locale. But some programs are more clever;
for example, one can define G_FILENAME_ENCODING env var to guide GTK2/GLib
programs; it can be a fixed encoding or a special value @locale. On the
other side there are programs that ignore locale completely and read/write
filenames using their own fixed encoding; for example, Transmission
bittorrent client read/write files in the encoding defined in the .torrent
metafile.

Oleg.
-- 
 Oleg Broytmanhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com