Re: How to use Unicode::Collate in multilinguage apps?

Rich Wed, 31 Mar 2004 00:47:51 -0800

Sadahiro Tomoyuki wrote:

> On Mon, 29 Mar 2004 23:44:00 +0100
> Rich <[EMAIL PROTECTED]> wrote:
> 
>> Using the multi-lingual server scenario I was initially discussing, would
>> one of the following usages be correct (yes, it's just pseudocode and
>> exists in a world where no errors ever occur!):
> 
> Though I have not worked with any multitasking application,
> I suppose a possible snag is the size of DUCET (the file named
> allkeys.txt) which should cause slowness of construction of
> a collator and large memory use for storage.


Yes, the size of allkeys.txt is an issue - I did a Data dump of a
Unicode::Collate instance and it's pretty big!

>> 1)
>> 
>>  my %collators;
>> 
>>  for ( $server_loop )
>>  {
>>    my $lang_tag = Server->requested_lang_tag;
>> 
>>    my $collator   = $collators{$lang_tag}
>>                 ||= Unicode::Collate::Locale->new(locale => $lang_tag);
>> 
>>    ...
>>  }
> 
> 1) creates a new collator if $lang_tag value is new.
> Say when the old one was 'en' (English) and the new one was 'it'
> (Italian), Unicode::Collate::Locale->new will return a default collator
> each time. I.e. $collators{en} and $collators{it} work as same but memory
> is not shared.

Good point!

> When Unicode::Collate->new is called, all the data generated by parsing
> of a table file are stored in a collator which is a blessed hash.
> The reason why so is, as I thinked, if (a part of) data newly created
> are stored in other places, say, in a cache at the package namespace
> (e.g. something like %Unicode::Collate::Cache), it might cause some
> problem on handling memory in the cache by users outside the package.
> 
> I think parhaps it should be necessary that a user can determine
> whether two (or more) $lang_tag values create the same collator or not.
> 
>     my $lang_tag = Server->requested_lang_tag;
>     my $canonical = Unicode::Collate::Locale::canonical_name($lang_tag);
> 
>     # if $canonical is same as an old one, the collator for it should be
>     # same. After seeing if $canonical is new, a collator can be created.
>     # The function name leaves room for reconsideration.

Yes, makes sense, but I'm starting to wonder if Unicode::Collate is too
heavyweight a solution. Perhaps something based around Sort::ArbBiLex might
produce good enough results for most languages.

Thanks for the reply
-- 
Rich
[EMAIL PROTECTED]

Re: How to use Unicode::Collate in multilinguage apps?

Reply via email to