On Wed, Jan 26, 2011 at 11:24:54AM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > > On Linux there's no defined encoding that will work; file names are just > > bytes to the Linux kernel so based on people's argument that the convention > > is and should be that filenames are utf-8 and anything else is > > a misconfigured system -- python should mandate that its module filenames > on > > Linux are utf-8 rather than using the user's locale settings. > > This isn't going to work where I live (Tsukuba). At the national > university alone there are hundreds of pre-existing *nix systems whose > filesystems were often configured a decade or more ago. Even if the > hardware and OS have been upgraded, the filesystems are usually > migrated as-is, with OS configuration tweaks to accomodate them. Many > of them use EUC-JP (and servers often Shift JIS). That means that you > won't be able to read module names with ls, and that will make Python > unacceptable for this purpose. I imagine that in Russia the same is > true for the various Cyrillic encodings. > Sure ... but with these systems, neither read-modules-as-locale or read-modules-as-utf-8 are a good solution to work, correct? Especially if the OS does get upgraded but the filesystems with user data (and user created modules) are migrated as-is, you'll run into situations where system installed modules are in utf-8 and user created modules are shift-jis and so something will always be broken.
The only way to make sure that modules work is to restrict them to ASCII-only on the filesystem. But because unicode module names are seen as a necessary feature, the question is which way forward is going to lead to the least brokenness. Which could be locale... but from the python2 locale-related bugs that I get to look at, I doubt. > I really don't think there is anything that can be done here except to > warn people that "Kids, these stunts are performed by highly-trained > professionals. Don't try this at home!" Of course they will anyway, > but at least they will have been warned in sufficiently strong terms > that they might pay attention and be able to recover when they run > into bizarre import exceptions. > So on the subject of warnings... I think a reason it's better to pick an encoding for the platform/filesystem rather than to use locale is because people will get an error or a warning at the appropriate time if that's the case -- the first time they attempt to create and import a module with a filename that's not encoded in the correct encoding for the platform. It's all very well to say: "We wrote in the documentation on http://docs.python.org/distutils/introduction.html#Choosing-a-name that only ASCII names should be used when distributing python modules" but if the interpreter doesn't complain when people use a non-ASCII filename we all know that they aren't going to look in the documentation; they'll try it and if it works they'll learn that habit. -Toshio
pgpjrrsvd3wof.pgp
Description: PGP signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com