Re: Unicode::Collate question

2003-12-08 Thread Eric Cholet
Le 6 déc. 03, à 09:20, SADAHIRO Tomoyuki a écrit : The syntax of collation customization (tailoring) in ICU ( http://oss.software.ibm.com/icu/userguide/Collate_Customization.html ) is character-based and may be more intuitive: for French: "[backwards 2]&A << \u00e6/e <<< \u00c6/E"

Re: Unicode::Collate question

2003-12-06 Thread SADAHIRO Tomoyuki
> Has anyone had a look at the OpenI18N/ICU locale data? > > The locales there are all UTF-8 and have java rule based collation data, so > they *might* be useful for creating a more comprehensive (and accurate) set > of sort modules? The downside is this data is pretty rough ATM but does > seem t

Re: Unicode::Collate question

2003-12-04 Thread Jarkko Hietaniemi
Has anyone had a look at the OpenI18N/ICU locale data? The locales there are all UTF-8 and have java rule based collation data, so they *might* be useful for creating a more comprehensive (and accurate) set of sort modules? The downside is this data is pretty rough ATM but does seem to be improv

Re: Unicode::Collate question

2003-12-04 Thread Rich
Sadahiro Tomoyuki wrote: > >> So I guess I need a Ligua:XX::Sort module for each language I operate >> on, >> in my original posting I was misled to believe that Unicode::Collate >> would >> be the tool to use. >> >> Thanks to all for the useful links provided in this thread. > > As far as I fo

Re: Unicode::Collate question

2003-12-02 Thread SADAHIRO Tomoyuki
> So I guess I need a Ligua:XX::Sort module for each language I operate > on, > in my original posting I was misled to believe that Unicode::Collate > would > be the tool to use. > > Thanks to all for the useful links provided in this thread. As far as I found, CPAN provides at least five modu

Re: Unicode::Collate question

2003-12-02 Thread Eric Cholet
Le 1 déc. 03, à 18:33, Jarkko Hietaniemi a écrit : % perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort qw(côte côté cote coté)' cote coté côte côté Is this the famous French "backwards accents" rule in action? (http://www-clips.imag.fr/geta/gilles.serasset/tri-du-francais.html) (no,

Re: Unicode::Collate question

2003-12-02 Thread Rafael Garcia-Suarez
Eric Cholet wrote in perl.unicode : > > So is it just by chance that these French words are accurately sorted? > > % perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort > qw(côte côté cote coté)' > cote coté côte côté Until recently, spanish dictionaries used to treat 'll' vowel as a

Re: Unicode::Collate question

2003-12-01 Thread Jarkko Hietaniemi
Ok, this is in line with what how I understood this paragraph in perluniintro: The short answer is that by default, Perl compares strings ("lt", "le", "cmp", "ge", "gt") based only on the code points of the char- acters. In the above case, the answer is "aft

Re: Unicode::Collate question

2003-12-01 Thread Eric Cholet
Le 1 dÃc. 03, Ã 16:46, Jarkko Hietaniemi a Ãcrit : Thank you both for your replies. What about sorting words in one particular language, is Perl's sort() good enough? I'm wondering, since language isn't one of sort()'s arguments. First we need to define "good enough"... again, if you are sorting

Re: Unicode::Collate question

2003-12-01 Thread Jarkko Hietaniemi
Thank you both for your replies. What about sorting words in one particular language, is Perl's sort() good enough? I'm wondering, since language isn't one of sort()'s arguments. First we need to define "good enough"... again, if you are sorting "simple" English or Hawaiian, you are probably fine

Re: Unicode::Collate question

2003-12-01 Thread Eric Cholet
Le 29 nov. 03, à 16:30, Jarkko.Hietaniemi a écrit : I want to correctly sort words in a variety of languages, currently French, English, Spanish, Portuguese, German and Arabic. I am using Perl 5.8.1 and unicode. I think I need Unicode::Collate to have *correct* sorting. Is this correct? In additio

RE: Unicode::Collate question

2003-11-30 Thread Edward Batutis
> -Original Message- > From: Jarkko.Hietaniemi [mailto:[EMAIL PROTECTED] ... > the UCA is not "correct" for any particular language ... Not by design, no, but it fine for English and Italian, for example. > I think it is worth pointing out that trying to sort multilingual > data is pra

Re: Unicode::Collate question

2003-11-29 Thread Jarkko . Hietaniemi
I want to correctly sort words in a variety of languages, currently French, English, Spanish, Portuguese, German and Arabic. I am using Perl 5.8.1 and unicode. I think I need Unicode::Collate to have *correct* sorting. Is this correct? In addition to the problems listed by Sadahiro (most importantl

Re: Unicode::Collate question

2003-11-29 Thread SADAHIRO Tomoyuki
[excuse me, I sent cc to [EMAIL PROTECTED]; I expect some helps and/or suggestions may be given there] > Greetings, > > I hope you won't mind a few questions related to your module > Unicode::Collate. > > I want to correctly sort words in a variety of languages, currently > French, English, Span