On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote: > >> Whoa! It's the other way round here. Nick is using a locale that > >> suits him for other reasons (e.g. getting time and data formats in > >> proper British ways), but why should he be constrained not to use for his > >> filenames whatever he wants? > > > > Then, he should switch to en_GB.UTF-8. > > That will work if there's en_GB.UTF-8 available for him in his > particular Unixes and assuming using UTF-8 locales won't break other > things.
IIRC, he explicitly mentioned 'Linux' in his message. Besides, Solaris, Compaq Tru64, AIX, and HP/UX [1] have all supported UTF-8 locales for a 'long' time (some of them far far longer than Linux/glibc has). In the past, all the locales don't come free, but these days, they all come with no extra charge so that it depends on the 'will'/'policy' of the system administrators whether that's available or not. Sure, there are a number of other Unix, old and new, and many old ones don't support UTF-8 locales. I do want to respect people's wish to make UTF-8 files on their file systems even if their version of Unix don't support UTF-8 locales. Otherwise, I wouldn't have come up with a set of 'options' Perl can offer to them. However, people doing so should be aware that there's price to pay. For instance, in their shell, file names would not be shown correctly (i.e. 'ls' would show you garbled characters) They can't use usual set of Unix tools (e.g. 'find' wouldn't work as intended). > > ISO-8859-1, which is why I wrote about mixing up two encodings > > in a single file system _under_ his control. > > I think we are here talking past each other :-) I'm assuming the > not all file systems (like Samba mounts) are not necessarily under > his control, you are assuming they Well, I think that's a different story. He explicitly wrote why he still uses en_GB.ISO-8859-1 (like some old programs breaking under UTF-8 locale). > > Moreover, why would you think that en_GB.UTF-8 locale gives him the > > time and date format NOT suitable for him? > I'm not thinking that. What I think his point is is that plain > en_GB.iso88591 is _enough_ for him to get time/date formats etc > working right, but en_GB.UTF-8 brings in _too much_ (such as some > programs not yet being UTF-8 aware enough, What you had in parentheses was what he wrote in his original message, but what you wrote didn't sound like that to me. At lesat, you took a bad example of time/date format. > or him wanting to use iso8859-1 file names in some directories, but in > some directories not). Yes, that's what I meant. He made a conscious decision to mix up two encodings (read his message. 'If I want Unicode characters in file names, I'd just use UTF-8' or something like that), for which he has to pay whatever price he has to pay. If Perl offers a set of options as I outlined in my previous message, he has to be careful when opening files in different directories. For some directories, he has to use one option while for other directories, he has to use another option. > > You're making a mistake of binding locale and encoding. > > I'm not-- many UNIX vendors do, and I have to with that fact. If Linux > and glibc are doing the Right Thing, that's marvelous, but not all the > world is Linux and glibc. I never implied that, let alone saying that. (I always prefer to say Unix in place of Linux. To me, Linux is just one of many Unix) And, please check out recent commercial Unix. They DO offer UTF-8 locales as I wrote above (Solaris and AIX had offered solid UTF-8 locales years before Linux/Glibc did - actually, when Linux/Glibc 1.x has almost __zero__ locale support, UTF-8 or not). Whether they're installed by the system admin. is a different story. Anyway, exactly because of the unavailability of UTF-8 locales for whatever reason, we've been discussing this issue (to convert Perl's internal Unicode to and from the 'native' encoding in file I/O.). > > The fact that it is on Unix is just an artifact of Unix file system > > Not quite. UNIX doesn't care. In traditional UNIX filenames are just > bytes. You're absolutely right. I didn't mean to say 'file system' there as I corrected in my subsequent email. > >> PLEASE, PEOPLE: stop thinking of this in terms of an environment > >> controlled solely by one user. > > > > Before writing that, please read the man page of 'smbmount' and > > 'mount' if Linux system is available to you. They're not environment > > variables. > > Please read my sentence again to see that I had no "variable" in it :-) > Just environment. OK. Sorry for misreading it. Anyway, Perl can't help resolve that problem. It can only offer a set of flexible options (as I listed in 'a few messages ago') that help people solve the problem for themselves. Jungshik [1] SGI Irix seems to lag behind in this area. FreeBSD was slow, but seems to have done a catch-up recently.