Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
I've spent a bit of time building a locale framework around the CLDR data, but it's not ready yet. The CLDR stuff is definitely useful, but patchy and downright wrong in parts - its the best freely available data out there ATM though. That's why extra eyeballs are always welcome. Contrariwise t

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Rich
Jarkko Hietaniemi wrote: > Incidentally, if anyone is interested in helping in getting a new locale > standard (one can never have too many :-), the CLDR project can always > use extra eyeballs. CLDR? Common Locale Data Repository: > http://oss.software.ibm.com/cvs/icu/~checkout~/locale/CLDR_sta

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Nick Ing-Simmons
Ed Batutis <[EMAIL PROTECTED]> writes: > >The point I'm trying to make (agreeing with most perl 5 porters I suspect) >is that supporting Shift-JIS in Perl5 is hopeless. I seem to recall my Japanese collegues at TI using it years ago... just treating it as octets and with a 'jperl' which did a lit

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
But those OSes also support older file systems (e.g. floppies), and shares where things are not as clear (at least to me). In cases of floppy (FAT), I guess we're just back to old days :-) In case of CIFS, I really have to check. Then, even Windows supports (although not free) NFS and other file

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
The point I'm trying to make (agreeing with most perl 5 porters I suspect) is that supporting Shift-JIS in Perl5 is hopeless. Curious. I could have sworn people like Dan Kogai are pretty happy. But I guess you refer to the Unicode <-> filename boundary. made to work at least for core features lik

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Ed Batutis
"Nick Ing-Simmons" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > If there is a bug which prevents you passing what your system requires > then set this out clearly as a bug report, via perlbug or some other > mechanism which gives us details of your perl (perl -V etc.) > The point

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
Locale is per-user - file systems on Unix are multi-user and there is no meta-data to say which locale a user was in when they wrote the file. Thanks for underlining this, I meant to mention it last night but forgot. Many locale() C libraries don't give access to what the encoding _is_ - and w

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
I don't see how introducing a new LC_* would help here. Whether Limit the mess of CTYPE controlling Yet Another Feature. it's LC_CTYPE or LC_FILENAME, the problem is still there. to and from the codeset returned by 'nl_langinfo(CODESET)'. Don't get me started how suckily and brokenly nl_langinfo

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jungshik Shin
On Tue, 23 Dec 2003, Nick Ing-Simmons wrote: > Jungshik Shin <[EMAIL PROTECTED]> writes: > >On Mon, 22 Dec 2003, Jarkko Hietaniemi wrote: > > > >> (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for > >> filenames, > >> but because of backward compatibility reasons using 8-bit cod

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jungshik Shin
On Tue, 23 Dec 2003, Jarkko Hietaniemi wrote: > > It works because it relies > > on iconv(3) to convert between the current locale codeset and UTF-16 > > (used internally by Mozilla) if/wherever possible. 'wc*to*mb/mb*to*wc' > > is only used only where iconv(3) is not available. Anyway, yes, that

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Nick Ing-Simmons
Jungshik Shin <[EMAIL PROTECTED]> writes: >On Mon, 22 Dec 2003, Jarkko Hietaniemi wrote: > >> (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for >> filenames, >> but because of backward compatibility reasons using 8-bit codepages is >> much >> more likely. > > No. _Both_ NTFS (on

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Nick Ing-Simmons
Ed Batutis <[EMAIL PROTECTED]> writes: >"Jarkko Hietaniemi" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] > >> You do know that ... >Yes. > >If wctomb or mbtowc are to be used, then Perl's Unicode must be converted >either to the locale's wide char or to its multibyte. Locale is pe

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
> I just mentioned it because even on Mac OS X, you have to do > things differently (before 10.2 and after 10.2). After 10.2(?), you > can rely on OS APIs while before that you have to roll your own. What I think would be useful would be to have a small _multiplatform_ library that would do what

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jungshik Shin
On Tue, 23 Dec 2003, Jarkko Hietaniemi wrote: > >> (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for > >> filenames, > >> but because of backward compatibility reasons using 8-bit codepages is > >> much > >> more likely. > > > > No. _Both_ NTFS (only supported by Win 2k/XP) an

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
'wchar_t' is not only locale dependent (i.e. run-time dependency) on a single platform but also a compiler-dependent. Yup. Isn't i18n fun? It works because it relies on iconv(3) to convert between the current locale codeset and UTF-16 (used internally by Mozilla) if/wherever possible. 'wc*to*mb/m

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Jarkko Hietaniemi
(AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for filenames, but because of backward compatibility reasons using 8-bit codepages is much more likely. No. _Both_ NTFS (only supported by Win 2k/XP) and VFAT (supported by Win 2k/XP and Win 9x/ME) use UTF-16LE **exclusively**. In t