Patches item #1552880, was opened at 2006-09-05 18:11 Message generated for change (Comment added) made by krisvale You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1552880&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Submitted By: Kristj�n Valur (krisvale) Assigned to: Nobody/Anonymous (nobody) Summary: Unicode Imports Initial Comment: This patch modifies the import mechanism to fully support unicode pathnames on Windows. It does this by first converting each member of sys.path to utf-8. strings are encoded using the current locale. The whole of the import logic is then unchanged and works on the utf-8 strings as though they were regular ascii strings in the current locale. Only when file operations are done, such as stat() and open(), do we then convert from utf-8 back to unicode and use the Windows unicode APIs for the job. This is also done when initializing Module objects. This approach has the benefit of being of having a low impact on the importing logic, and is thus easy to verify. There is however some overhead with the conversions. At CCP games we used this approach, backported to python 2.3, to get unicode imports working for our game, EVE Online, and thereby solving installation issues in the far east. This patch is submitted as demonstration code to the python community. I would like to see unicode fully supported in 2.6. Cheers, Kristján ---------------------------------------------------------------------- >Comment By: Kristj�n Valur (krisvale) Date: 2006-09-12 09:38 Message: Logged In: YES user_id=1262199 I submitted this mostly as a demonstration. I don't think the approach is necessarily suitable for a final implementation because of the use of utf-8 as an intermediate representation and the price of the conversions that keep happening. But perhaps this is the way to go, if we consider utf-8 to be a stage-1 default file system encoding for win32. I also agree that 4 is probably the most sensible approach. What about discrepancies between e.g. linux and windows then, when including from a non-trivial path? On linux we would get utf-8, on windows unicode? 1) would actually make a lot of sense, only in my experience this tends to lead to a kind of unicode-hell since a program touched by one unicode object tends to have it percolating down into every corner. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-09-09 12:31 Message: Logged In: YES user_id=21627 First: Do you want to continue to work on this, or do you consider this just "demonstration code" (i.e. not contributed for inclusion in Python), hoping that somebody else implements this feature? I think the behavior of __file__ must be more consistent across platforms, and the selected behaviour must be documented somewhere. Several definitions of "consistent behavior" come to mind: 1. __file__ is always a Unicode string 2. __file__ is a byte string if its ASCII, else Unicode 3. __file__ is a byte string if its in the system encoding, else Unicode 4. __file__ is a byte string if its in the file system encoding, else Unicode. The documentation needs to be updated in several places, e.g. also for inspect.getfile. I would expect that pydoc would also need to be updated. Selecting from the options above: I believe 4 is most compatible with previous versions; 1 and 2 are most convenient to work with in applications like pydoc which have to generate HTML (1 is easier to work with, 2 is more compatible with previous versions). ---------------------------------------------------------------------- Comment By: Kristj�n Valur (krisvale) Date: 2006-09-09 11:38 Message: Logged In: YES user_id=1262199 >From the top of my head, it is now unicode. I consider trying to convert it back to the default encoding but decided not to to keep the patch brief. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-09-08 21:03 Message: Logged In: YES user_id=21627 What is the value of the __file__ attribute of a module when this patch is used? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1552880&group_id=5470
_______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches