[excuse me, I sent cc to [EMAIL PROTECTED];
I expect some helps and/or suggestions may be given there]

> Greetings,
> 
> I hope you won't mind a few questions related to your module
> Unicode::Collate.
> 
> I want to correctly sort words in a variety of languages, currently
> French, English, Spanish, Portuguese, German and Arabic. I am using
> Perl 5.8.1 and unicode. I think I need Unicode::Collate to have
> *correct* sorting. Is this correct?

Sorry, I think 'no', by default.
"DUCET", that is a default collation table provided by
unicode.org, do sort among many scripts in Unicode,
but does not do any language-specific collation.

> Assuming it is, how can I find the correct settings for each of
> the languages I'm interested in? I've read U::Collate's doc carefully,
> but it is fairly complex, I'm not sure I could get it right given that
> I'm neither a Unicode specialist, nor am I fluent in all the languages
> I need to implement. What's the way, are there any wrappers available
> or standard set of parameters per language?

If proper collation tables in the "UCA" format (which is a file format
for collation specified by Unicode technical standard #10) are provided,
that may be achieved;
though such a collation table file in UCA format should not be included
in the Unicode::Collate package, since its size should keep small
as possible.

For other formats except UCA, some sources about collation are available.
Here is a list as far as I know.

http://oss.software.ibm.com/cvs/icu/locale/
http://std.dkuug.dk/i18n/locales/

I once attempted to analyze data in std.dkuug.dk,
but I did not have a way how to "unicodify" them.

http://std.dkuug.dk/i18n/locales/mnemonic.ds seems to include
non-Unicode characters. As I don't know their meaning and usage,
my attempt made no advance.

> Thanks in advance for any insight or pointers you can contribute.
> Regards,
> --
> Eric Cholet

Regards,
SADAHIRO Tomoyuki

Reply via email to