[excuse me, I sent cc to [EMAIL PROTECTED]; I expect some helps and/or suggestions may be given there]
> Greetings, > > I hope you won't mind a few questions related to your module > Unicode::Collate. > > I want to correctly sort words in a variety of languages, currently > French, English, Spanish, Portuguese, German and Arabic. I am using > Perl 5.8.1 and unicode. I think I need Unicode::Collate to have > *correct* sorting. Is this correct? Sorry, I think 'no', by default. "DUCET", that is a default collation table provided by unicode.org, do sort among many scripts in Unicode, but does not do any language-specific collation. > Assuming it is, how can I find the correct settings for each of > the languages I'm interested in? I've read U::Collate's doc carefully, > but it is fairly complex, I'm not sure I could get it right given that > I'm neither a Unicode specialist, nor am I fluent in all the languages > I need to implement. What's the way, are there any wrappers available > or standard set of parameters per language? If proper collation tables in the "UCA" format (which is a file format for collation specified by Unicode technical standard #10) are provided, that may be achieved; though such a collation table file in UCA format should not be included in the Unicode::Collate package, since its size should keep small as possible. For other formats except UCA, some sources about collation are available. Here is a list as far as I know. http://oss.software.ibm.com/cvs/icu/locale/ http://std.dkuug.dk/i18n/locales/ I once attempted to analyze data in std.dkuug.dk, but I did not have a way how to "unicodify" them. http://std.dkuug.dk/i18n/locales/mnemonic.ds seems to include non-Unicode characters. As I don't know their meaning and usage, my attempt made no advance. > Thanks in advance for any insight or pointers you can contribute. > Regards, > -- > Eric Cholet Regards, SADAHIRO Tomoyuki