Patches item #1552880, was opened at 2006-09-05 20:11
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1552880&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Submitted By: Kristj�n Valur (krisvale)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode Imports

Initial Comment:
This patch modifies the import mechanism to fully
support unicode pathnames on Windows.  It does this by
first converting each member of sys.path to utf-8. 
strings are encoded using the current locale.

The whole of the import logic is then unchanged and
works on the utf-8 strings as though they were regular
ascii strings in the current locale.

Only when file operations are done, such as stat() and
open(), do we then convert from utf-8 back  to unicode
and use the Windows unicode APIs for the job.  This is
also done when initializing Module objects.

This approach has the benefit of being of having a low
impact on the importing logic, and is thus easy to
verify.  There is however some overhead with the
conversions.

At CCP games we used this approach, backported to
python 2.3, to get unicode imports working for our
game, EVE Online, and thereby solving installation
issues in the far east.


This patch is submitted as demonstration code to the
python community.  I would like to see unicode fully
supported in 2.6.

Cheers,
Kristján

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2006-09-09 14:31

Message:
Logged In: YES 
user_id=21627

First: Do you want to continue to work on this, or do you
consider this just "demonstration code" (i.e. not
contributed for inclusion in Python), hoping that somebody
else implements this feature?

I think the behavior of __file__ must be more consistent
across platforms, and the selected behaviour must be
documented somewhere. Several definitions of "consistent
behavior" come to mind:
1. __file__ is always a Unicode string
2. __file__ is a byte string if its ASCII, else Unicode
3. __file__ is a byte string if its in the system encoding,
else Unicode
4. __file__ is a byte string if its in the file system
encoding, else Unicode.

The documentation needs to be updated in several places,
e.g. also for inspect.getfile.

I would expect that pydoc would also need to be updated.

Selecting from the options above: I believe 4 is most
compatible with previous versions; 1 and 2 are most
convenient to work with in applications like pydoc which
have to generate HTML (1 is easier to work with, 2 is more
compatible with previous versions).


----------------------------------------------------------------------

Comment By: Kristj�n Valur (krisvale)
Date: 2006-09-09 13:38

Message:
Logged In: YES 
user_id=1262199

>From the top of my head, it is now unicode.  I consider
trying to convert it back to the default encoding but
decided not to to keep the patch brief.  

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-09-08 23:03

Message:
Logged In: YES 
user_id=21627

What is the value of the __file__ attribute of a module when
this patch is used?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1552880&group_id=5470
_______________________________________________
Patches mailing list
Patches@python.org
http://mail.python.org/mailman/listinfo/patches

Reply via email to