Re: [Python-Dev] Unicode Imports
Nick Coghlan schrieb: So this is taking something that *already works properly on POSIX systems* and making it work on Windows as well. I doubt it does without side effects. For example, an application that would go through sys.path, and encode everything with sys.getfilesystemencoding() currently works, but will break if the patch is applied and non-mbcs strings are put on sys.path. Also, what will be the effect on __file__? What value will it have if the module originates from a sys.path entry that is a non-mbcs unicode string? I haven't tested the patch, but it looks like __file__ becomes a unicode string on Windows, and remains a byte string encoded with the file system encoding elsewhere. That's also a change in behavior. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Martin v. Löwis wrote: Nick Coghlan schrieb: So this is taking something that *already works properly on POSIX systems* and making it work on Windows as well. I doubt it does without side effects. For example, an application that would go through sys.path, and encode everything with sys.getfilesystemencoding() currently works, but will break if the patch is applied and non-mbcs strings are put on sys.path. Also, what will be the effect on __file__? What value will it have if the module originates from a sys.path entry that is a non-mbcs unicode string? I haven't tested the patch, but it looks like __file__ becomes a unicode string on Windows, and remains a byte string encoded with the file system encoding elsewhere. That's also a change in behavior. Just to summarise my feeling having read the words of those more familiar with the issues than me: it looks like this should be a 2.6 enhancement if it's included at all. I'd like to see it go in, but there do seem to be problems ensuring consistent behaviour across inconsistent platforms. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Martin v. Löwis wrote: Nick Coghlan schrieb: So this is taking something that *already works properly on POSIX systems* and making it work on Windows as well. I doubt it does without side effects. For example, an application that would go through sys.path, and encode everything with sys.getfilesystemencoding() currently works, but will break if the patch is applied and non-mbcs strings are put on sys.path. Huh? It won't break on any path for which it is not already broken. You seem to be saying Paths with non-mbcs strings shouldn't work on Windows, because they haven't worked in the past. -- David Hopwood [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
David Hopwood schrieb: I doubt it does without side effects. For example, an application that would go through sys.path, and encode everything with sys.getfilesystemencoding() currently works, but will break if the patch is applied and non-mbcs strings are put on sys.path. Huh? It won't break on any path for which it is not already broken. You seem to be saying Paths with non-mbcs strings shouldn't work on Windows, because they haven't worked in the past. That's not what I'm saying. I'm saying that it shouldn't work in 2.5.x, because it didn't in 2.5.0. Changing it in 2.6 is fine, along with the incompatibilities it causes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
David Hopwood wrote: Martin v. Löwis wrote: Nick Coghlan schrieb: So this is taking something that *already works properly on POSIX systems* and making it work on Windows as well. I doubt it does without side effects. For example, an application that would go through sys.path, and encode everything with sys.getfilesystemencoding() currently works, but will break if the patch is applied and non-mbcs strings are put on sys.path. Huh? It won't break on any path for which it is not already broken. You seem to be saying Paths with non-mbcs strings shouldn't work on Windows, because they haven't worked in the past. I think MvL is looking at it from the point of view of consumers of the list of strings in sys.path, such as PEP 302 importer and loader objects, and tools like module_finder. Currently, the list of values in sys.path is limited to: 1. 8-bit strings 2. Unicode strings containing only characters which can be encoded using the default file system encoding For PEP 302 loaders, it is currently correct for them to take the 8-bit string they receive and do path.decode(sys.getfilesystemencoding()) Kristján's patch works nicely for his application because he doesn't have to worry about compatibility with existing loaders and utilities. The core doesn't have that luxury. We *might* be able to find a backwards compatible way to do it that could be put into 2.5.x, but that is effort that could more profitably be spent elsewhere, particularly since the state of the import system in Py3k will be for it to be based entirely on Unicode (as GvR pointed out last time this topic came up [1]). Cheers, Nick. http://mail.python.org/pipermail/python-dev/2006-June/066225.html -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Nick Coghlan schrieb: I think MvL is looking at it from the point of view of consumers of the list of strings in sys.path, such as PEP 302 importer and loader objects, and tools like module_finder. Currently, the list of values in sys.path is limited to: That, and all kinds of inspection tools. For example, when __file__ of a module object changes to be a Unicode string (which it does under the proposed patch), then these tools break. They currently don't break in that way because putting arbitrary Unicode strings on sys.path doesn't work in the first place. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Nick Coghlan wrote: David Hopwood wrote: Martin v. Löwis wrote: Nick Coghlan schrieb: So this is taking something that *already works properly on POSIX systems* and making it work on Windows as well. I doubt it does without side effects. For example, an application that would go through sys.path, and encode everything with sys.getfilesystemencoding() currently works, but will break if the patch is applied and non-mbcs strings are put on sys.path. Huh? It won't break on any path for which it is not already broken. You seem to be saying Paths with non-mbcs strings shouldn't work on Windows, because they haven't worked in the past. I think MvL is looking at it from the point of view of consumers of the list of strings in sys.path, such as PEP 302 importer and loader objects, and tools like module_finder. Currently, the list of values in sys.path is limited to: 1. 8-bit strings 2. Unicode strings containing only characters which can be encoded using the default file system encoding On Windows, file system pathnames can contain arbitrary Unicode characters (well, almost). Despite the existence of ANSI filesystem APIs, and regardless of what 'sys.getfilesystemencoding()' returns, the underlying file system encoding for NTFS and FAT filesystems is UTF-16LE. Thus, either: - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding on Windows is a bug, or - any program that relies on sys.getfilesystemencoding() being able to encode arbitrary Windows pathnames has a bug. We need to decide which of these is the case. -- David Hopwood [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
David Hopwood schrieb: On Windows, file system pathnames can contain arbitrary Unicode characters (well, almost). Despite the existence of ANSI filesystem APIs, and regardless of what 'sys.getfilesystemencoding()' returns, the underlying file system encoding for NTFS and FAT filesystems is UTF-16LE. Thus, either: - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding on Windows is a bug, or - any program that relies on sys.getfilesystemencoding() being able to encode arbitrary Windows pathnames has a bug. We need to decide which of these is the case. There is a third option: - the operating system has a bug It is actually this option that rules out the other two. sys.getfilesystemencoding() returns mbcs on Windows, which means CP_ACP. The file system encoding is an encoding that converts a file name into a byte string. Unfortunately, on Windows, there are file names which cannot be converted into a byte string in a standard manner. This is an operating system bug (or mis-design; they should have chosen UTF-8 as the byte encoding of file names, instead of making it depend on the system locale, but they of course did so for backwards compatibility with Windows 3.1 and 9x). As a side note: every encoding in Python is a Unicode encoding; so there aren't any non-Unicode encodings. Programs that rely on sys.getfilesystemencoding() being able to represent arbitrary file names on Windows might have a bug; programs that rely on sys.getfilesystemencoding() being able to encode all elements of sys.path do not (atleast not for Python 2.5 and earlier). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Martin v. Löwis wrote: David Hopwood schrieb: On Windows, file system pathnames can contain arbitrary Unicode characters (well, almost). Despite the existence of ANSI filesystem APIs, and regardless of what 'sys.getfilesystemencoding()' returns, the underlying file system encoding for NTFS and FAT filesystems is UTF-16LE. Thus, either: - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding on Windows is a bug, or - any program that relies on sys.getfilesystemencoding() being able to encode arbitrary Windows pathnames has a bug. We need to decide which of these is the case. There is a third option: - the operating system has a bug This behaviour is by design. If it is a bug, then it is a won't ever fix -- no way, no how bug, that Python must accomodate if it is to properly support Unicode on Windows. It is actually this option that rules out the other two. sys.getfilesystemencoding() returns mbcs on Windows, which means CP_ACP. The file system encoding is an encoding that converts a file name into a byte string. Unfortunately, on Windows, there are file names which cannot be converted into a byte string in a standard manner. This is an operating system bug (or mis-design; they should have chosen UTF-8 as the byte encoding of file names, instead of making it depend on the system locale, but they of course did so for backwards compatibility with Windows 3.1 and 9x). Although UTF-8 was invented (in September 1992) technically before the release of the first version of NT supporting NTFS (NT 3.1 in July 1993), it had not been invented before the decision to use Unicode in NTFS, or in Windows NT's file APIs, had been made. (I believe OS/2 HPFS had not supported Unicode, even though NTFS was otherwise almost identical to it.) At that time, the decision to use Unicode at all was quite forward-looking; the final version of Unicode 1.0 had only been published in June 1992 (although it had been approved earlier; see http://www.unicode.org/history/). UTF-8 was only officially added to the Unicode standard in an appendix of Unicode 2.0 (published July 1996), and only given essentially equal status to UTF-16 and UTF-32 in Unicode 3.0 (September 1999). As a side note: every encoding in Python is a Unicode encoding; so there aren't any non-Unicode encodings. It was clear from context that I meant encoding capable of representing all Unicode characters. Programs that rely on sys.getfilesystemencoding() being able to represent arbitrary file names on Windows might have a bug; programs that rely on sys.getfilesystemencoding() being able to encode all elements of sys.path do not (at least not for Python 2.5 and earlier). Elements of sys.path can be Unicode strings in Python 2.5, and should be pathnames supported by the underlying OS. Where is it documented that there is any further restriction on them? And why should there be any further restriction on them? -- David Hopwood [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
David Hopwood schrieb: Elements of sys.path can be Unicode strings in Python 2.5, and should be pathnames supported by the underlying OS. Where is it documented that there is any further restriction on them? And why should there be any further restriction on them? It's not documented in that detail; if people think it should be documented more thoroughly, that should be done (contributions are welcome). Changing the import machinery to deal with Unicode strings differently cannot be done for Python 2.5, though: it cannot be done for 2.5.0 as the release candidate has already been published, and there is no acceptable patch available at this moment. It cannot be added to 2.5.x as it may reasonably break existing applications. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
David Hopwood wrote: Martin v. Löwis wrote: Programs that rely on sys.getfilesystemencoding() being able to represent arbitrary file names on Windows might have a bug; programs that rely on sys.getfilesystemencoding() being able to encode all elements of sys.path do not (at least not for Python 2.5 and earlier). Elements of sys.path can be Unicode strings in Python 2.5, and should be pathnames supported by the underlying OS. Where is it documented that there is any further restriction on them? And why should there be any further restriction on them? There's no suggestion that this limitation shouldn't be fixed - merely that fixing it is likely to break some applications which rely on sys.path for importing or introspection purposes. A 2.5.x maintenance release typically shouldn't break anything that worked correctly on 2.5.0, hence fixing this becomes a project for either 2.6 or 3.0. To put it another way: fixing this is likely to require changes to more than just the interpreter core. It will also potentially require changes to all applications which currently expect to be able to use 's.encode(sys.getfilesystemencoding())' to convert any Unicode path entry or __file__ attribute to an 8-bit string. Doing that qualifies as correcting a language design error or limitation, but it would require a real stretch of the definition to qualify as a bug fix. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Anthony Baxter wrote: On Friday 08 September 2006 02:56, Kristján V. Jónsson wrote: Hello All. I just added patch 1552880 to sourceforge. It is a patch for 2.6 (and 2.5) which allows unicode paths in sys.path and uses the unicode file api on windows. This is tried and tested on 2.5, and backported to 2.3 and is currently running on clients in china and esewhere. It is minimally intrusive to the inporting mechanism, at the cost of some string conversion overhead (to utf8 and then back to unicode). As this can't be considered a bugfix (that I can see), I'd be against it being checked into 2.5. Are you suggesting that Python's inability to correctly handle Unicode path elements isn't a bug? Or simply that this inability isn't currently described in a bug report on Sourceforge? I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
On Friday 08 September 2006 18:24, Steve Holden wrote: As this can't be considered a bugfix (that I can see), I'd be against it being checked into 2.5. Are you suggesting that Python's inability to correctly handle Unicode path elements isn't a bug? Or simply that this inability isn't currently described in a bug report on Sourceforge? I'm suggesting that adding the ability to handle unicode paths is a *new* *feature*. If people actually want to see 2.5 final ever released, they're going to have to accept that oh, but just this _one_ _more_ _thing_ is not going to fly. We're _well_ past beta1, where new features should have been added. At this point, we have to cut another release candidate. This is far too much to add during the release candidate stage. I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. Possibly. I remain unconvinced. -- Anthony Baxter [EMAIL PROTECTED] It's never too late to have a happy childhood. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Anthony Baxter wrote: On Friday 08 September 2006 18:24, Steve Holden wrote: As this can't be considered a bugfix (that I can see), I'd be against it being checked into 2.5. Are you suggesting that Python's inability to correctly handle Unicode path elements isn't a bug? Or simply that this inability isn't currently described in a bug report on Sourceforge? I'm suggesting that adding the ability to handle unicode paths is a *new* *feature*. That's certainly true. If people actually want to see 2.5 final ever released, they're going to have to accept that oh, but just this _one_ _more_ _thing_ is not going to fly. We're _well_ past beta1, where new features should have been added. At this point, we have to cut another release candidate. This is far too much to add during the release candidate stage. Right. I couldn't argue for putting this in to 2.5 - it would certainly represent unwarranted feature creep at the rc2 stage. I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. Possibly. I remain unconvinced. But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Steve Holden wrote: Anthony Baxter wrote: On Friday 08 September 2006 18:24, Steve Holden wrote: I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. Possibly. I remain unconvinced. But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. And unlike 2.2's True/False problem, it is an *environmental* feature, rather than a programmatic one. So while it's a new feature, it would merely mean that 2.5.1 works correctly in more environments than 2.5. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
On Friday 08 September 2006 19:19, Steve Holden wrote: But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. Point releases (2.x.1 and suchlike) are absolutely not for new features. They're for bugfixes, only. It's possible that this could be considered a bugfix, but as I said right now I'm dubious. Anthony -- Anthony Baxter [EMAIL PROTECTED] It's never too late to have a happy childhood. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Anthony Baxter wrote: On Friday 08 September 2006 19:19, Steve Holden wrote: But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. Point releases (2.x.1 and suchlike) are absolutely not for new features. They're for bugfixes, only. It's possible that this could be considered a bugfix, but as I said right now I'm dubious. OK, in that case I'm going to argue that the current behaviour is buggy. I suppose your point is that, assuming the patch is correct (and it seems the authors are relying on it for production purposes in tens of thousands of installations), it doesn't change the behaviour of the interpreter in existing cases, and therefore it is providing a new feature. I don't regard this as the provision of a new feature but as the removal of an unnecessary restriction (which I would prefer to call a bug). If it was *documented* somewhere that Unicode paths aren't legal I would find your arguments more convincing. As things stand new Python users would, IMHO, be within their rights to assume that arbitrary directories could be added to the path without breakage. Ultimately, your call, I guess. Would it help if I added inability to import from Unicode directories as a bug? Or would you prefer to change the documentation to state that some directories can't be used as path elements 0.3 wink? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
On 9/8/06, Steve Holden [EMAIL PROTECTED] wrote: Anthony Baxter wrote: On Friday 08 September 2006 19:19, Steve Holden wrote: But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. Point releases (2.x.1 and suchlike) are absolutely not for new features. They're for bugfixes, only. It's possible that this could be considered a bugfix, but as I said right now I'm dubious. OK, in that case I'm going to argue that the current behaviour is buggy. I suppose your point is that, assuming the patch is correct (and it seems the authors are relying on it for production purposes in tens of thousands of installations), it doesn't change the behaviour of the interpreter in existing cases, and therefore it is providing a new feature. I don't regard this as the provision of a new feature but as the removal of an unnecessary restriction (which I would prefer to call a bug). If it was *documented* somewhere that Unicode paths aren't legal I would find your arguments more convincing. As things stand new Python users would, IMHO, be within their rights to assume that arbitrary directories could be added to the path without breakage. Ultimately, your call, I guess. Would it help if I added inability to import from Unicode directories as a bug? Or would you prefer to change the documentation to state that some directories can't be used as path elements 0.3 wink? We've all heard the arguments for both sides enough times I think. IMO it's the call of the release managers. Board members ought to trust the release managers and not apply undue pressure. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Guido IMO it's the call of the release managers. Board members ought to Guido trust the release managers and not apply undue pressure. Indeed. Let's not go whacking people with boards. The Perl people would just laugh at us... Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Guido van Rossum [EMAIL PROTECTED] wrote: IMO it's the call of the release managers. Board members ought to trust the release managers and not apply undue pressure. +1, but I would love to see a more formal definition of what a bugfix is, which would reduce the ambiguous cases, and thus reduce the number of times the release managers are called to pronounce. Other projects, for instance, describe point releases as open for regression fixes only, which means that a patch, to be eligible for a point release, must fix a regression (something which used to work before, and doesn't anymore). Regressions are important because they affect people wanting to upgrade Python. If something never worked before (like this unicode path thingie), surely existing Python users are not affected by the bug (or they have already workarounds in place), so that NOT having the bug fixed in a point release is not a problem. Anyway, I'm not pushing for this specific policy (even if I like it): I'm just suggesting Release Managers to more formally define what should and what should not go in a point release. Giovanni Bajo ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Giovanni Bajo wrote: +1, but I would love to see a more formal definition of what a bugfix is, which would reduce the ambiguous cases, and thus reduce the number of times the release managers are called to pronounce. Sorry, that is just a pipe-dream. To some degree, all bug-fixes are new features in that there is some behavioral difference, something will now work that wouldn't work before. While some cases are clear-cut (such as API changes), the ones that are interesting will defy definition and need a human judgment call as to whether a given change will help more than it hurts. The RMs are also strongly biased against extensive patches than haven't had a chance to go through a beta-cycle -- they don't want their releases mucked-up. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Kristján V. Jónsson wrote: Hello All. I just added patch 1552880 to sourceforge. It is a patch for 2.6 (and 2.5) which allows unicode paths in sys.path and uses the unicode file api on windows. This is tried and tested on 2.5, and backported to 2.3 and is currently running on clients in china and esewhere. It is minimally intrusive to the inporting mechanism, at the cost of some string conversion overhead (to utf8 and then back to unicode). +1 on adding it to Python 2.6. -0 for Python 2.5.x: Applications/modules written for Python 2.4 and 2.5 won't be expecting Unicode strings in sys.path with all the consequences that go with it, so this is a true change in semantics, not just a nice to have additional feature or bug fix. OTOH, those applications will just break in a different place with the patch applied :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 08 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Steve Holden schrieb: As this can't be considered a bugfix (that I can see), I'd be against it being checked into 2.5. Are you suggesting that Python's inability to correctly handle Unicode path elements isn't a bug? Not sure whether Anthony suggests it, but I do. Or simply that this inability isn't currently described in a bug report on Sourceforge? No: sys.path is specified (originally) as containing a list of byte strings; it was extended to also support path importers (or whatever that PEP calls them). It was never extended to support Unicode strings. That other PEP e I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. I'm not so sure it should. It *is* a new feature: it makes applications possible which aren't possible today, and the documentation does not ever suggest that these applications should have been possible. In fact, it is common knowledge that this currently isn't supported. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Steve Holden schrieb: I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. Possibly. I remain unconvinced. But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. Because 2.5.1 shouldn't include any new features. If it is a new feature (which it is), it should go into 2.6. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Nick Coghlan schrieb: But it *is* a desirable, albeit new, feature, so I'm surprised that you don't appear to perceive it as such for a downstream release. And unlike 2.2's True/False problem, it is an *environmental* feature, rather than a programmatic one. Not sure what you mean by that; if you mean thus existing applications cannot break: this is not true. In fact, it seems that some applications are extremely susceptible to the types of objects on sys.path. Some applications apparently know exactly what you can and cannot find on sys.path; changing that might break them. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Steve Holden schrieb: I don't regard this as the provision of a new feature but as the removal of an unnecessary restriction (which I would prefer to call a bug). You got the definition of bug wrong. Primarily, a bug is a deviation from the specification. Extending the domain of an argument to an existing function is a new feature. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Giovanni Bajo schrieb: +1, but I would love to see a more formal definition of what a bugfix is, which would reduce the ambiguous cases, and thus reduce the number of times the release managers are called to pronounce. Other projects, for instance, describe point releases as open for regression fixes only, which means that a patch, to be eligible for a point release, must fix a regression (something which used to work before, and doesn't anymore). In Python, the tradition has excepted bug fixes beyond that. For example, fixing a memory leak would also count as a bug fix. In general, I think a bug is a deviation from the specification (it might be necessary to interpret the specification first to find out whether the implementation deviates). A bug fix is then a behavior change so that the new behavior follows the specification, or a specification change so that it correctly describes the behavior. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
Martin v. Löwis wrote: Steve Holden schrieb: Or simply that this inability isn't currently described in a bug report on Sourceforge? No: sys.path is specified (originally) as containing a list of byte strings; it was extended to also support path importers (or whatever that PEP calls them). It was never extended to support Unicode strings. That other PEP e That other PEP being PEP 302. That said, Unicode strings *are* permitted on sys.path - the import system will automatically encode them to an 8-bit string using the default filesystem encoding as part of the import process. This works fine on Unix systems that use UTF-8 encoded strings to handle Unicode paths at the C API level, but is screwed on Windows because the default mbcs filesystem encoding can't handle the full range of possible Unicode path names (such as the Chinese directories that originally gave Kristján grief). To get Unicode path names to work on Windows, you have to use the Windows-specific wide character API instead of the normal C API, and the import machinery doesn't do that. So this is taking something that *already works properly on POSIX systems* and making it work on Windows as well. I agree it's a relatively large patch for a release candidate but if prudence suggests deferring it, it should be a *definite* for 2.5.1 and subsequent releases. I'm not so sure it should. It *is* a new feature: it makes applications possible which aren't possible today, and the documentation does not ever suggest that these applications should have been possible. In fact, it is common knowledge that this currently isn't supported. It should already work fine on POSIX filesystems that use the default filesystem encoding for path names. As far as I am aware, it is only Windows where it doesn't work. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Unicode Imports
Hello All. I just added patch 1552880 to sourceforge. It is a patch for 2.6 (and 2.5) which allows unicode paths in sys.path and uses the unicode file api on windows. This is tried and tested on 2.5, and backported to 2.3 and is currently running on clients in china and esewhere. It is minimally intrusive to the inporting mechanism, at the cost of some string conversion overhead (to utf8 and then back to unicode). Cheers, Kristján ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode Imports
On Friday 08 September 2006 02:56, Kristján V. Jónsson wrote: Hello All. I just added patch 1552880 to sourceforge. It is a patch for 2.6 (and 2.5) which allows unicode paths in sys.path and uses the unicode file api on windows. This is tried and tested on 2.5, and backported to 2.3 and is currently running on clients in china and esewhere. It is minimally intrusive to the inporting mechanism, at the cost of some string conversion overhead (to utf8 and then back to unicode). As this can't be considered a bugfix (that I can see), I'd be against it being checked into 2.5. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Martin v. Löwis schrieb: Thomas Heller wrote: It should be noted that I once started to convert the import machinery to be fully unicode aware. As far as I can tell, a *lot* has to be changed to make this work. Is that code available somewhere still? Does it still work? Available as patch 1093253, I have not tried if it stil works I started with refactoring Python/import.c, but nobody responded to the question whether such a refactoring patch would be accepted or not. I would like to see minimal changes only. I don't see why massive refactoring would be necessary: the structure of the code should persist - only the data types should change from char* to PyObject*. Calls like stat() and open() should be generalized to accept PyObject*, and otherwise keep their interface. To be really useful, wide char versions of other things must also be made available: command line arguments, environment variables (PYTHONPATH), and maybe other stuff. Thomas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Thomas Heller wrote: Is that code available somewhere still? Does it still work? Available as patch 1093253, I have not tried if it stil works I see. It's quite a huge change, that's probably why nobody found the time to review it, yet. To be really useful, wide char versions of other things must also be made available: command line arguments, environment variables (PYTHONPATH), and maybe other stuff. While I think these things should eventually be done, I don't think they are that related to import.c. If W9x support gets dropped, we can rewrite PC/getpathp.c to use the Unicode API throughout; that would allow to put non-ANSI path names onto PYTHONPATH. Making os.environ support Unicode is entirely different isusue. I would like to see os.environ return Unicode if the key is Unicode; another option would be to introduce os.uenviron. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Ideally, I would like for python to simply work. It seems to me that it is mostly a question of time when all modern platforms offer unicode filesystems and hence unicode APIs. IMHO, stuff like the importer should really be written in native unicode and revert to ASCII only as a fallback for unsupporting platforms. is WITH_UNICODE ever left undefined these days? And sure, module names need to be python identifiers (thus ASCII), although I wouldn't be surprised if that restriction were lifted in a not too distant future :) After all, we support the utf-8 encoding of source files, but I cannot write kristján = 1. But that's for a future PEP. Kristján -Original Message- From: Nick Coghlan [mailto:[EMAIL PROTECTED] Sent: 16. júní 2006 15:30 To: Kristján V. Jónsson Cc: Python Dev Subject: Re: [Python-Dev] unicode imports Kristján V. Jónsson wrote: A cursory glance at import.c shows that the import mechanism is fairly complicated, and riddled with char *path thingies, and manual string arithmetic. Do you have any suggestions on a clean way to unicodify the import mechanism? Can you install a PEP 302 path hook and importer/loader that can handle path entries that are Unicode strings? (I think this would end up being the parallel implementation you were talking about, though) If the code that traverses sys.path and sys.path_hooks is itself unicode-unaware (I don't remember if it is or isn't), then you might be able to trick it by poking a Unicode-savvy importer directly into the path_importer_cache for affected Unicode paths. One issue is that the package and file names still have to be valid Python identifiers, which means ASCII. Unicode would be, at best, permitted only in the path entries. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Okay, for specifics which demonstrate the problem. I have a directory, C:\tmp\腌 In it, there is a file, doo.py d = os.listdir(uc:/tmp)[-1] d u'\u814c' d2 = os.listdir(uc:/tmp/+d) d2 [u'doo.py'] p = uc:/tmp/+d p u'c:/tmp/\u814c' sys.path.append(p) import doo Traceback (most recent call last): File stdin, line 1, in module ImportError: No module named doo p.encode(mbcs) 'c:/tmp/?' p.encode(gb2312) 'c:/tmp/\xeb\xe7' Running your example test code gives: Prefixes: C:\PyDev25 C:\PyDev25 Path: ['c:\\tmp', 'c:\\documents and settings\\kristjan\\my documents\\python', 'C:\\PyDev25\\PCbuild8\\python25.zip', 'C:\\PyDev25\\DLLs', 'C:\\PyDev25\\lib', 'C:\\PyDev25\\lib\\plat-win', 'C:\\PyDev25\\lib\\lib-tk', 'C:\\PyDev25\\PCbuild8 ', 'C:\\PyDev25', 'C:\\PyDev25\\lib\\site-packages'] Default encoding: ascii Input encoding: cp850 Output encodings: cp850 cp850 -Original Message- From: Nick Coghlan [mailto:[EMAIL PROTECTED] Sent: 17. júní 2006 04:17 To: Phillip J. Eby Cc: Kristján V. Jónsson; Python Dev Subject: Re: [Python-Dev] unicode imports Phillip J. Eby wrote: Actually, you would want to put it in sys.path_hooks, and then instances would be placed in path_importer_cache automatically. If you are adding it to the path_hooks after the fact, you should simply clear the path_importer_cache. Simply poking stuff into the path_importer_cache is not a recommended approach. Oh, I agree - poking it in directly was a desperation measure if the path_hooks machinery didn't like Unicode either. I've since gone and looked, and you may be screwed either way - the standard import paths appear to be always put on the system path as encoded 8-bit strings, not as Unicode objects. That said, it also appears that the existing machinery *should* be able to handle non-ASCII path items, so long as 'Py_FileSystemDefaultEncoding' is set correctly. If it isn't handling it, then there's something else going wrong. Modules/getpath.c and friends don't encode the results returned by the platform APIs, so the strings in Kristján, can you provide more details on the fault you get when trying to import from the path containing the Chinese characters? Specifically: What is the actual file system path? What do sys.prefix, sys.exec_prefix and sys.path contain? What does sys.getdefaultencoding() return? What do sys.stdin.encoding, sys.stdout.encoding and sys.stderr.encoding say? What does python -v show? Does adding the standard lib directories manually to sys.path make any difference? Does setting PYTHONHOME to the appropriate settings make any difference? Running something like the following would be good: import sys print Prefixes:, sys.prefix, sys.exec_prefixes print Path:, sys.path print Default encoding:, sys.getdefaultencoding() print Input encoding:, sys.stdin.encoding, print Output encodings:, sys.stdout.encoding, sys.stderr.encoding try: import string # Make python -v do something interesting except ImportError: print Could not find string module sys.path.append(ustdlib directory name) try: import string # Make python -v do something interesting except ImportError: print Could not find string module -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Well, my particular test uses u'c:/tmp/\u814c' If that cannot be encoded in mbcs, then mbcs isn't useful. Note that this is both an issue of python being able to run from an arbitrary install position, and also the ability of users to import and run scripts from any other arbitrary directory. Kristján -Original Message- From: Neil Hodgson [mailto:[EMAIL PROTECTED] Sent: 17. júní 2006 04:53 To: Kristján V. Jónsson Cc: Python Dev Subject: Re: [Python-Dev] unicode imports Kristján V. Jónsson: Although python has had full unicode support for filenames for a long time on selected platforms (e.g. Windows), there is one glaring deficiency: It cannot import from paths containing unicode. I´ve tried creating folders with chinese characters and adding them to path, to no avail. The standard install path in chinese distributions can be with a non-ANSI path, and installing an embedded python application there will break it. It should be unusual for a Chinese installation to use an install path that can not be represented in MBCS. Try encoding the install directory into MBCS before adding it to sys.path. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
I don't have specific information on the machines. We didn´t try very hard to get things to work with 2.3 since we simply assumed it would work automatically when we upgraded to a more mature 2.4. I could try to get more info, but it would be 2.3 specific. Have there been any changes since then? Note that it may not go into program files at all. Someone may want to install his modules in a folder named in the honour of his mother. Also, I really would like to see a general solution that doesn´t assume that the path name can somhow be transmuted to an ascii name. Users are unpredictable. When you have a wide distribution , you come up against all kinds of problems (Currently we have around 500.000 users in china.) Also, relying on some locale settings is not acceptable. My machine here has the icelandic locale. Yet, I need to be able to set up and use a chinese install. Likewise, many machines in china will have an english locale. A default encoding and locale is essentially an evil hack in our increasingly global environment. We have converted more or less our entire code base to unicode because keeping track of encoded strings is simply unworkable in a large project. Funny that no other platforms could benefit from a unicode import path. Does that mean that windows will reign supreme? Please explain. Cheers, Kristján -Original Message- From: Martin v. Löwis [mailto:[EMAIL PROTECTED] Sent: 17. júní 2006 08:42 To: Kristján V. Jónsson Cc: Python Dev Subject: Re: [Python-Dev] unicode imports Kristján V. Jónsson wrote: The standard install path in chinese distributions can be with a non-ANSI path, and installing an embedded python application there will break it. I very much doubt this. On a Chinese system, the Program Files folder likely has a non-*ASCII* name, but it will have a fine *ANSI* name, as the ANSI code page on that system should be either 936 (simplified chinese) or 950 (traditional chinese) - unless the system is misconfigured. Can you please report what the path is, what the precise name of the operating system is, and what the system locale and the system code page are? A completely parallel implementation on the sys.path[i] level? You should also take a look at what the 8.3 name of the path is. I really cannot believe that the path is unaccessible to DOS programs. Are there other platforms beside Windows that would profit from this? No. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
It should be noted that I once started to convert the import machinery to be fully unicode aware. As far as I can tell, a *lot* has to be changed to make this work. I started with refactoring Python/import.c, but nobody responded to the question whether such a refactoring patch would be accepted or not. Thomas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Thomas Heller wrote: It should be noted that I once started to convert the import machinery to be fully unicode aware. As far as I can tell, a *lot* has to be changed to make this work. I started with refactoring Python/import.c, but nobody responded to the question whether such a refactoring patch would be accepted or not. Perhaps someone should start a PEP on this subject ?! (not me, though :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 19 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Kristján V. Jónsson wrote: Funny that no other platforms could benefit from a unicode import path. Does that mean that windows will reign supreme? Please explain. As near as I can tell, other platforms use encoded strings with the normal (byte-based) posix file API, so the Python interpreter and the file system simply need to agree on the encoding (typically utf-8) in order for both filesystem access and importing from non-ASCII paths to work. On Windows, though, most of the file system interaction code has had to be updated to use the wide-character API where possible. import.c is one of the few holdouts that relies entirely on the byte-based posix API. If I had to put money on what's currently happening on your test machine, it's that import.c is trying to do u'c:/tmp/\u814c'.encode('mbcs'), getting 'c:/tmp/?' and proceeding to do nothing useful with that path entry. Checking the result of sys.getfilesystemencoding() should be able to confirm that. So it looks like it ain't really gonna work properly on Windows unless import.c is rewritten to use the Unicode-aware platform independent IO implementation in posixmodule.c. Until that happens (hopefully by Python 2.6), I like MvL's suggestion - look at the 8.3 DOS name on the command prompt and put that into sys.path. ctypes and/or pywin32 should let you get at that information programmatically. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Wouldn´t it be possible then to emulate the unix way? Simply encode any unicode paths to utf-8, process them as normal, and then decode them just prior to the actual windows io call? It would make sense to just use the utf-8 encoding all the way for all platforms (since it is easy to work with), and then convert to most appropriate encoding for the platform in question right at the end, e.g. unicode for windows, mbcs for windows without unicode (win98) (which relies on the LC_LOCALE setting) and whatever 8 bit encoding is appropriate for the particular unix platform. Of course, once there, why not do it unicode all the way up to that last point? Unless there are platforms without wchar_t that would make sense. At any rate, I am trying to find a coding path of least resistance here. Regardless of the timeline or acceptance in mainstream python for this feature, it is something I will have to patch in for our application. Cheers, Kristján -Original Message- From: Nick Coghlan [mailto:[EMAIL PROTECTED] Sent: 19. júní 2006 13:46 To: Kristján V. Jónsson Cc: Martin v. Löwis; Python Dev Subject: Re: [Python-Dev] unicode imports Kristján V. Jónsson wrote: Funny that no other platforms could benefit from a unicode import path. Does that mean that windows will reign supreme? Please explain. As near as I can tell, other platforms use encoded strings with the normal (byte-based) posix file API, so the Python interpreter and the file system simply need to agree on the encoding (typically utf-8) in order for both filesystem access and importing from non-ASCII paths to work. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
On 6/16/06, Kristján V. Jónsson [EMAIL PROTECTED] wrote: Although python has had full unicode support for filenames for a long time on selected platforms (e.g. Windows), there is one glaring deficiency: It cannot import from paths containing unicode. I´ve tried creating folders with chinese characters and adding them to path, to no avail. I don't know exactly where this discussion is heading at this point, but I think it's clear that there's a real (though -- yet -- rare) problem, for which currently only ugly work-arounds exist. I'm not convinced that it occurs on other platforms than Windows -- everyone else seems to use UTF-8 for pathnames, while Windows is stuck with code pages and other crap, and the only reasaonably way to access Unicode pathnames is via the Windows-specific Unicode API (which is why import is the last place where this isn't easily solved, as the import machinery is completely 8-bit-based). Has it been determined yet whether the DOS 8+3 filename cannot be used as a workaround? Perhaps it would be good enough to wait for Py3k? That will have pure Unicode strings and the import machinery will be completely rewritten anyway. (And I wouldn't be surprised if that rewrite were to use pure Python code.) Py3k will be released later than Python 2.6, but most likely before 2.7. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Kristján V. Jónsson wrote: I don't have specific information on the machines. We didn´t try very hard to get things to work with 2.3 since we simply assumed it would work automatically when we upgraded to a more mature 2.4. I could try to get more info, but it would be 2.3 specific. Have there been any changes since then? Not in that respect, no. Note that it may not go into program files at all. Someone may want to install his modules in a folder named in the honour of his mother. It's certainly possible to set this up in a way that it won't work, on any localized version: just use a path name that isn't supported in the ANSI code page. However, that should rarely happen: the name of his mother should still be expressable in the ANSI code page, if the system is setup correctly. Also, I really would like to see a general solution that doesn´t assume that the path name can somhow be transmuted to an ascii name. (Please don't say ASCII here. Windows *A APIs are named that way because Microsoft Windows has the notion of an ANSI code page, which, in turn, is just a code page indirection so some selected code page meant to support the characters of the user's locale) Users are unpredictable. When you have a wide distribution , you come up against all kinds of problems (Currently we have around 500.000 users in china.) Also, relying on some locale settings is not acceptable. Sure, but stating that doesn't really help. Code contributions would help, but that part of Python has been left out of using the *W API, because it is particularly messy to fix. Funny that no other platforms could benefit from a unicode import path. Does that mean that windows will reign supreme? That is the case, more or less. Or, more precisely: - On Linux, Solaris, and most other Unices, file names are bytes on the system API, and are expected to be encoded in the user's locale. So if your locale does not support a character, you can't name a file that way, on Unix. There is a trend towards using UTF-8 locales, so that the locale contains all Unicode characters. - On Mac OS X, all file names are UTF-8, always (unless the user managed to mess it up), so you can have arbitrary Unicode file names That means that the approach of converting a Unicode sys.path element to the file system encoding will always do the right thing on Linux and OS X: the file system encoding will be the locale's encoding on Linux, and will be UTF-8 on OS X. It's only Windows which has valid file names that cannot be represented in the current locale. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Kristján V. Jónsson wrote: Wouldn´t it be possible then to emulate the unix way? Simply encode any unicode paths to utf-8, process them as normal, and then decode them just prior to the actual windows io call? That won't work. People also put path names from the ANSI code page onto sys.path and expect that to work - it always worked, and is a nearly-complete work-around to put directories with funny characters onto sys.path. sys.path is a list, so we have little control over what gets put onto it. Of course, once there, why not do it unicode all the way up to that last point? Unless there are platforms without wchar_t that would make sense. Again, we can't really control that. Also, most platforms have no wchar_t API for file IO. We would have to encode each sys.path element for each stat() call, which would be quite expensive At any rate, I am trying to find a coding path of least resistance here. Regardless of the timeline or acceptance in mainstream python for this feature, it is something I will have to patch in for our application. The path with least resistance should be usage of 8.3 directory names. The one to implement in future Python versions should be the rewrite of import.c, to operate on PyObject* instead of char*, and perform conversion to the native API only just before calling the native API. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Thomas Heller wrote: It should be noted that I once started to convert the import machinery to be fully unicode aware. As far as I can tell, a *lot* has to be changed to make this work. Is that code available somewhere still? Does it still work? I started with refactoring Python/import.c, but nobody responded to the question whether such a refactoring patch would be accepted or not. I would like to see minimal changes only. I don't see why massive refactoring would be necessary: the structure of the code should persist - only the data types should change from char* to PyObject*. Calls like stat() and open() should be generalized to accept PyObject*, and otherwise keep their interface. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Kristján V. Jónsson wrote: The standard install path in chinese distributions can be with a non-ANSI path, and installing an embedded python application there will break it. I very much doubt this. On a Chinese system, the Program Files folder likely has a non-*ASCII* name, but it will have a fine *ANSI* name, as the ANSI code page on that system should be either 936 (simplified chinese) or 950 (traditional chinese) - unless the system is misconfigured. Can you please report what the path is, what the precise name of the operating system is, and what the system locale and the system code page are? A completely parallel implementation on the sys.path[i] level? You should also take a look at what the 8.3 name of the path is. I really cannot believe that the path is unaccessible to DOS programs. Are there other platforms beside Windows that would profit from this? No. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Neil Hodgson wrote: It should be unusual for a Chinese installation to use an install path that can not be represented in MBCS. Try encoding the install directory into MBCS before adding it to sys.path. Indeed. Unfortunately, people apparently install an English version (because they can get that without paying any license fee), and then create directory names that can't be represented in the ANSI code page (which would then be 1252). Still, on such a system, the target folder for programs should be Program Files. If people do that, they *should* change the system locale to some Chinese locale, but being non-admin people, they often don't. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
On 17-jun-2006, at 6:44, Nick Coghlan wrote: Bob Ippolito wrote: There's a similar issue in that if sys.prefix contains a colon, Python is also busted: http://python.org/sf/1507224 Of course, that's not a Windows issue, but it is everywhere else. The offending code in that case is Modules/getpath.c, Since it has to do with the definition of Py_GetPath as returning a single string that is really a DELIM separated list of strings, where DELIM is defined by the current platform (';' on Windows, ':' everywhere else), this seems more like a platform problem than a Python problem, though - you can't have directories containing a colon as an entry in PATH or PYTHONPATH either. It's not really Python's fault that the platform defines a legal filename character as the delimiter for path entries. On unix-y systems any character except the NUL byte can be used in a legal fileystem path, that leaves awfully little characters to use as delimiter without risking issues like the one in the bug Bob mentioned. The only real alternative I can see is to normalise Py_GetPath to always return a ';' delimited list of strings, regardless of platform, and update PySys_SetPath accordingly. That'd cause potential compatibility problems for embedded interpreters, though. That wouldn't help, ';' is also a valid character in filenames on Unix. Except for accepting the status quo (which is a perfectly fine alternative) there seem to be two valid ways to solve this problem. You can either define Py_GetPath2 that returns a python list or tuple, or introduce some way of quoting the delimiter. Both would be backward incompatible. Ronald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] unicode imports
Greetings! Although python has had full unicode support for filenames for a long time on selected platforms (e.g. Windows), there is one glaring deficiency: It cannot import from paths containing unicode. I´ve tried creating folders with chinese characters and adding them to path, to no avail. The standard install path in chinese distributions can be with a non-ANSI path, and installing an embedded python application there will break it. At the moment this is hindering the installation of EVE on Chinese internet-cafés. A cursory glance at import.c shows that the import mechanism is fairly complicated, and riddled with "char *path" thingies, and manual string arithmetic. Do you have any suggestions on a clean way to unicodify the import mechanism? A completely parallel implementation on the sys.path[i] level? Are there other platforms beside Windows that would profit from this? Cheers, Kristján ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Kristján V. Jónsson wrote: A cursory glance at import.c shows that the import mechanism is fairly complicated, and riddled with char *path thingies, and manual string arithmetic. Do you have any suggestions on a clean way to unicodify the import mechanism? Can you install a PEP 302 path hook and importer/loader that can handle path entries that are Unicode strings? (I think this would end up being the parallel implementation you were talking about, though) If the code that traverses sys.path and sys.path_hooks is itself unicode-unaware (I don't remember if it is or isn't), then you might be able to trick it by poking a Unicode-savvy importer directly into the path_importer_cache for affected Unicode paths. One issue is that the package and file names still have to be valid Python identifiers, which means ASCII. Unicode would be, at best, permitted only in the path entries. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
At 01:29 AM 6/17/2006 +1000, Nick Coghlan wrote: Kristján V. Jónsson wrote: A cursory glance at import.c shows that the import mechanism is fairly complicated, and riddled with char *path thingies, and manual string arithmetic. Do you have any suggestions on a clean way to unicodify the import mechanism? Can you install a PEP 302 path hook and importer/loader that can handle path entries that are Unicode strings? (I think this would end up being the parallel implementation you were talking about, though) If the code that traverses sys.path and sys.path_hooks is itself unicode-unaware (I don't remember if it is or isn't), then you might be able to trick it by poking a Unicode-savvy importer directly into the path_importer_cache for affected Unicode paths. Actually, you would want to put it in sys.path_hooks, and then instances would be placed in path_importer_cache automatically. If you are adding it to the path_hooks after the fact, you should simply clear the path_importer_cache. Simply poking stuff into the path_importer_cache is not a recommended approach. One issue is that the package and file names still have to be valid Python identifiers, which means ASCII. Unicode would be, at best, permitted only in the path entries. If I understand the problem correctly, the issue is that if you install Python itself to a Unicode directory, you'll be unable to import anything from the standard library. This isn't about module names, it's about the places on the path where that stuff goes. However, if the issue is that the program works, but it puts unicode entries on sys.path, I would suggest simply encoding them to strings using the platform-appropriate codec. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
On Jun 16, 2006, at 9:02 AM, Phillip J. Eby wrote: At 01:29 AM 6/17/2006 +1000, Nick Coghlan wrote: Kristján V. Jónsson wrote: A cursory glance at import.c shows that the import mechanism is fairly complicated, and riddled with char *path thingies, and manual string arithmetic. Do you have any suggestions on a clean way to unicodify the import mechanism? Can you install a PEP 302 path hook and importer/loader that can handle path entries that are Unicode strings? (I think this would end up being the parallel implementation you were talking about, though) If the code that traverses sys.path and sys.path_hooks is itself unicode-unaware (I don't remember if it is or isn't), then you might be able to trick it by poking a Unicode-savvy importer directly into the path_importer_cache for affected Unicode paths. Actually, you would want to put it in sys.path_hooks, and then instances would be placed in path_importer_cache automatically. If you are adding it to the path_hooks after the fact, you should simply clear the path_importer_cache. Simply poking stuff into the path_importer_cache is not a recommended approach. One issue is that the package and file names still have to be valid Python identifiers, which means ASCII. Unicode would be, at best, permitted only in the path entries. If I understand the problem correctly, the issue is that if you install Python itself to a Unicode directory, you'll be unable to import anything from the standard library. This isn't about module names, it's about the places on the path where that stuff goes. There's a similar issue in that if sys.prefix contains a colon, Python is also busted: http://python.org/sf/1507224 Of course, that's not a Windows issue, but it is everywhere else. The offending code in that case is Modules/getpath.c, which probably also has to change in order to make unicode directories work on Win32 (though I think there may be a separate win32 implementation of getpath). -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Phillip J. Eby wrote: Actually, you would want to put it in sys.path_hooks, and then instances would be placed in path_importer_cache automatically. If you are adding it to the path_hooks after the fact, you should simply clear the path_importer_cache. Simply poking stuff into the path_importer_cache is not a recommended approach. Oh, I agree - poking it in directly was a desperation measure if the path_hooks machinery didn't like Unicode either. I've since gone and looked, and you may be screwed either way - the standard import paths appear to be always put on the system path as encoded 8-bit strings, not as Unicode objects. That said, it also appears that the existing machinery *should* be able to handle non-ASCII path items, so long as 'Py_FileSystemDefaultEncoding' is set correctly. If it isn't handling it, then there's something else going wrong. Modules/getpath.c and friends don't encode the results returned by the platform APIs, so the strings in Kristján, can you provide more details on the fault you get when trying to import from the path containing the Chinese characters? Specifically: What is the actual file system path? What do sys.prefix, sys.exec_prefix and sys.path contain? What does sys.getdefaultencoding() return? What do sys.stdin.encoding, sys.stdout.encoding and sys.stderr.encoding say? What does python -v show? Does adding the standard lib directories manually to sys.path make any difference? Does setting PYTHONHOME to the appropriate settings make any difference? Running something like the following would be good: import sys print Prefixes:, sys.prefix, sys.exec_prefixes print Path:, sys.path print Default encoding:, sys.getdefaultencoding() print Input encoding:, sys.stdin.encoding, print Output encodings:, sys.stdout.encoding, sys.stderr.encoding try: import string # Make python -v do something interesting except ImportError: print Could not find string module sys.path.append(ustdlib directory name) try: import string # Make python -v do something interesting except ImportError: print Could not find string module -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Bob Ippolito wrote: There's a similar issue in that if sys.prefix contains a colon, Python is also busted: http://python.org/sf/1507224 Of course, that's not a Windows issue, but it is everywhere else. The offending code in that case is Modules/getpath.c, Since it has to do with the definition of Py_GetPath as returning a single string that is really a DELIM separated list of strings, where DELIM is defined by the current platform (';' on Windows, ':' everywhere else), this seems more like a platform problem than a Python problem, though - you can't have directories containing a colon as an entry in PATH or PYTHONPATH either. It's not really Python's fault that the platform defines a legal filename character as the delimiter for path entries. The only real alternative I can see is to normalise Py_GetPath to always return a ';' delimited list of strings, regardless of platform, and update PySys_SetPath accordingly. That'd cause potential compatibility problems for embedded interpreters, though. I guess we could create a Py_GetPathEx and a PySys_SetPathEx that accepted the delimeters as arguments, and change the call in pythonrun.c from: PySys_SetPath(Py_GetPath()) to: PySys_SetPathEx(Py_GetPathEx(';'), ';') (still an incompatible change, but an easier to manage one since you can easily provide different behavior for earlier versions of Python) which probably also has to change in order to make unicode directories work on Win32 (though I think there may be a separate win32 implementation of getpath). There is - PC/getpathp.c Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com