On Thu, Jan 20, 2011 at 03:27:08PM -0500, Glyph Lefkowitz wrote: > > On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote: > Same here. *Most* code will never be shared, or will only be shared > between users in the same community. When it goes wrong it's also a > learning opportunity. :-) > > > Despite my usual proclivity for being contrarian, I find myself in agreement > here. Linux users with locales that don't specify UTF-8 frankly _should_ have > to deal with all kinds of nastiness until they can transcode their > filesystems. > MacOS and Windows both have a "right" answer here and your third-party tools > shouldn't create mojibake in your filenames. > However, if this is the consensus, it makes a lot more sense to pick utf-8 as *the* encoding for python module filenames on Linux.
Why UTF-8: * UTF-8 can cover the whole range of unicode whereas most (all?) other locale friendly encodings cannot. * UTF-8 is becoming a standard for Linux distributions whether or not Linux users are adopting it. * Third party tools are gaining support for UTF-8 even when they aren't gaining support for generic encodings (If I read the spec on zip correctly, this is actually what's happening there). Why not locale: * Relying on locale is simply not portable. If nothing prevents people from distributing a unicode filename then they will go ahead and do so. If the result works (say, because it's utf-8 and 80% of the Linux userbase is using utf-8) then it will get packaged and distributed and people won't know that it's a problem until someone with a non-utf-8 locale decids to use it. * Mixing of modules from different locales won't work. Suppose that the system python installs the previous module. The local site has other modules that it has installed using a different filename encoding. The users at the site will find that either one or hte other of the two modules won't work. * Because of the portability problems you have no choice but to tell people not to distribute python modules with non-ASCII names. This makes the use of unicode names second class indefintely (until the kernel devs decide that they're wrong to not enforce a filesystem encoding or Linux becomes irrelevant as a platform). * If you can pick a set of encodings that are valid (utf-8 for Linux and MacOS, wide unicode for windows [I get the feeling from other parts of the conversation that Windows won't be so lucky, though]) tools to convert python names become easier to write. If you restrict it far enough, you could even write tools/importers that automatically do the detection. PS: Sorry for not replying immediately, the team I'm on is dealing with an issue at my work and I'm also preparing for a conference later this week. -Toshio
pgpq1C0qGW77C.pgp
Description: PGP signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com