-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-01-10, 17:34 GMT, you wrote: > From my experience, the concept of a default locale is deeply > flawed. What if I log into a (Linux) machine using an old > latin-1 putty from the Windows XP era, have most file names > and contents in UTF-8 encoding, except for one directory where > people from eastern Europe upload files via FTP in whatever > encoding they choose. What should the "default" encoding be > now?
I know this stuff is really hard and only because I had to fight with it for a years (being Czech, so not blessed by Latin-1 covering my language … actually no living encoding does support it completely, but that’s mostly theoretical issue … Latin-2 used to work for us, and now everybody with civilized OS uses UTF-8 of course, not sure what’s the current state of MS Windows). It seems to me that you have some fundamental principles muddled together. a) Locale should be always set for the particular system. I.e., in your example above you have two variables only: locale of your Windows XP and locale of the Linux box. b) I know for fact that exactly putty (even on Windows XP) CAN translate from UTF-8 on the server to whatever Windows have to offer. So, there is no such thing as “latin-1 putty”. c) Responsibility for filenames on the system stands on whatever actually saves the file. So, in this testcase it is a matter of correct setting up of the FTP server (I see for example http://rhn.redhat.com/errata/RHBA-2012-0187.html and https://bugzilla.redhat.com/show_bug.cgi?id=638873 which seem to indicate that vsftpd, and what else you would use?, should support UTF-8 on filenames). If the server locale supports Eastern European filenames and vsftpd supports translation to this encoding (hint, hint: UTF-8 does), then you are all set. > That's why I make it a principle to always unset all LC_* and > LANG variables, except when working locally, which happens > rather rarely. That’s a bad idea. Those variables have ALWAYS some value set (perhaps default, which tends to be something like en_US.ASCII, which is not what you want, fortunately on most Unices these days it would be en_US.UTF8, command locale(1) always gives some result). Matěj -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iD8DBQFS0TsM4J/vJdlkhKwRAg9+AJ9wuCEnPqbUr6imA2L9ak17svSP3ACePVRp 5MKkSVUQ9G7A+fZVhDGiEC8= =MXgT -----END PGP SIGNATURE----- _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com