On Thu, 25 Dec 2003, Jarkko Hietaniemi wrote:

> >> Whoa!  It's the other way round here.  Nick is using a locale that
> >> suits him for other reasons (e.g. getting time and data formats in
> >> proper British ways), but why should he be constrained not to use for his
> >> filenames  whatever he wants?
> >
> >   Then, he should switch to en_GB.UTF-8.
>
> That will work if there's en_GB.UTF-8 available for him in his
> particular Unixes and assuming using UTF-8 locales won't break other
> things.

 IIRC, he explicitly mentioned 'Linux' in his message. Besides,
Solaris, Compaq Tru64, AIX, and HP/UX [1] have all supported UTF-8 locales
for a 'long' time (some of them far far longer than Linux/glibc has). In
the past, all the locales don't come free, but these days, they all come
with no extra charge so that it depends on  the 'will'/'policy' of the
system administrators whether that's available or not. Sure, there are a
number of other Unix, old and new, and many old ones don't support UTF-8
locales.

I do want to respect people's wish to  make UTF-8 files on their file
systems even if their version of Unix don't support UTF-8 locales.
Otherwise, I wouldn't have come up with a set of 'options' Perl can
offer to them.  However, people doing so should be aware that there's
price to pay.  For instance, in their shell, file names would not be
shown correctly (i.e.  'ls' would show you garbled characters) They
can't use usual set of Unix tools (e.g. 'find' wouldn't work as intended).

> > ISO-8859-1, which is why I wrote about mixing up two encodings
> > in a single file system _under_ his control.
>
> I think we are here talking past each other :-)  I'm assuming the
> not all file systems (like Samba mounts) are not necessarily under
> his control, you are assuming they

 Well, I think that's a different story. He explicitly wrote why
he still uses en_GB.ISO-8859-1 (like some old programs breaking under
UTF-8 locale).

> >   Moreover, why would you think that en_GB.UTF-8 locale gives him the
> > time and date format NOT suitable for him?

> I'm not thinking that.  What I think his point is is that plain
> en_GB.iso88591 is _enough_ for him to get time/date formats etc
> working right, but  en_GB.UTF-8 brings in _too much_ (such as some
> programs not yet being UTF-8 aware enough,

 What you had in parentheses was what he wrote in his original message,
but what you wrote didn't sound like that to me. At lesat, you took a
bad example of time/date format.

> or him wanting to use iso8859-1 file names in some directories, but in
> some directories not).

  Yes, that's what I meant. He made a conscious decision to
mix up two encodings (read his message. 'If I want Unicode characters
in file names, I'd just use UTF-8' or something like that), for which
he has to pay whatever price he has to pay.  If Perl offers a set of
options as I outlined in my previous message, he has to be careful when
opening files in different directories.  For some directories, he has to
use one option while for other directories, he has to use another option.


> > You're making a mistake of binding locale and encoding.
>
> I'm not-- many UNIX vendors do, and I have to with that fact.  If Linux
> and glibc are doing the Right Thing, that's marvelous, but not all the
> world is Linux and glibc.

 I never implied that, let alone saying that. (I always prefer to say
Unix in place of Linux. To me, Linux is just one of many Unix) And,
please check out recent commercial Unix. They DO offer UTF-8 locales as I
wrote above (Solaris and AIX had offered solid UTF-8 locales years before
Linux/Glibc did - actually, when Linux/Glibc 1.x has almost __zero__
locale support, UTF-8 or not). Whether they're installed by the system
admin. is a different story. Anyway, exactly because of the unavailability
of UTF-8 locales for whatever reason, we've been discussing this issue
(to convert Perl's internal Unicode to and from the 'native' encoding
in file I/O.).

> > The fact that it is on Unix is just an artifact of Unix file system
>
> Not quite.  UNIX doesn't care.  In traditional UNIX filenames are just
> bytes.

 You're absolutely right. I didn't mean to say 'file system' there
as I corrected in my subsequent email.


> >> PLEASE, PEOPLE: stop thinking of this in terms of an environment
> >> controlled solely by one user.
> >
> >   Before writing that, please read the man page of 'smbmount' and
> > 'mount' if Linux system is available to you. They're not environment
> > variables.
>
> Please read my sentence again to see that I had no "variable" in it :-)
> Just environment.

 OK. Sorry for misreading it. Anyway, Perl can't help resolve that problem.
It can only offer a set of flexible options (as I listed in 'a few
messages ago') that help people solve the problem for themselves.

  Jungshik

[1] SGI Irix seems to lag behind in this area. FreeBSD was slow, but
seems to have done a catch-up recently.

Reply via email to