Neil Hodgson <[EMAIL PROTECTED]> writes: > Thomas Heller: > >> OTOH, I once had a bug report from a py2exe user who complained that the >> program didn't start when installed in a path with japanese characters >> on it. I tried this out, the bug existed (and still exists), but I was >> astonished how many programs behaved the same: On a PC with english >> language settings, you cannot start WinZip or Acrobat Reader (to give >> just some examples) on a .zip or .pdf file contained in such a >> directory. > > Much of the time these sorts of bugs don't make themselves too hard > to live with because most non-ASCII names that any user encounters > are still in the user's locale and so get mapped by Windows.
> It can be a lot of work supporting wide file names. I have just added > wide file name support to my editor, SciTE, for the second time and am > about to rip it out again as it complicates too much code for too few > beneficiaries. (I want one executable for both Windows NT+ and 9x, so > wide file names has to be a runtime choice leading to maybe 50 new > branches in the code). In python, the basic support for unicode file and pathnames is already there. No problem to open a file named u'\u5b66\u6821\u30c7\u30fc\\blah.py on WinXP with german locale. But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import this file as module. Internally Python\import.c converts everything to strings. I started to refactor import.c to work with PyStringObjects instead of char buffers as a first step - PyUnicodeObjects could have been added later, but I gave up because there seems absolute zero interest in it. Ok - it makes no sense to have Python modules in directories with these filenames, but Python (especially when frozen or py2exe'd) itself could easily live itself in such a directory. > If returning a mixture of unicode and narrow strings from > os.listdir is the right thing to do then maybe it better for sys.argv > and os.environ to also be mixtures. In patch #1231336 I added parallel > attributes, sys.argvu and os.environu to hold unicode versions of this > information. The alternative, placing unicode items in the existing > attributes minimises API size. > > One question here is whether unicode items should be added only > when the element is outside the user's locale (the CP_ACP code page) > or whenever the item is outside ASCII. The former is more similar to > existing behaviour but the latter is safer as it makes it harder to > implicitly treat the data as being in an incorrect encoding. I can't judge on this - but it's easy to experiment with it, even in current Python releases since sys.argvu, os.environu can also be provided by extension modules. But thanks that you care about this stuff - I'm a little bit worried because all the other folks seem to think everything's ok (?). Thomas _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com