Re: Unicode::Collate string replacements and case sensitivity

2011-05-05 Thread SADAHIRO Tomoyuki
On Thu, 28 Apr 2011 10:06:58 -0700 (PDT) Frank Müller wrote: > dear all, > I'm trying to do some string replacements with Unicode::Collate which > usually work very well, but these replacements seem to be case > insensitive by default - how can I change this? look at this

Unicode::Collate string replacements and case sensitivity

2011-04-29 Thread Frank Müller
dear all, I'm trying to do some string replacements with Unicode::Collate which usually work very well, but these replacements seem to be case insensitive by default - how can I change this? look at this simple example: my $myCollator = Unicode::Collate->new( normalization => undef,

Re: Unicode::Collate string replacements and whitespace

2010-09-27 Thread Frank Müller
.@wt if $to_be_pushed; > -       } else { > +       } elsif ($to_be_pushed) { >             push @subWt, [ \...@wt ]; >         } >      } > > Regards, > SADAHIRO Tomoyuki > > > dear all, > > most probably I'm missing something quite obvious and very simple, >

Re: Unicode::Collate string replacements and whitespace

2010-09-22 Thread SADAHIRO Tomoyuki
code yet. > I'm making some string replacements with Unicode::Collate > which generally works fine but for whitespace. I have the following > simple code (adopted from the module documentation): > > my $myCollator = Unicode::Collate->new( normalization => undef, level => 1 )

Re: [ANN] Unicode::Collate 0.54 released

2010-07-27 Thread Sastry
Good Job Sadahiro! - Ravi Sastry Kadali On Mon, Jul 26, 2010 at 7:35 PM, SADAHIRO Tomoyuki wrote: > Hello, all. > > Unicode::Collate 0.54 [1] supports a C-compiled DECUT [2],[3] via XSUB, > that may save time when a new collator will be constructed. > > If you want use the com

[ANN] Unicode::Collate 0.54 released

2010-07-26 Thread SADAHIRO Tomoyuki
Hello, all. Unicode::Collate 0.54 [1] supports a C-compiled DECUT [2],[3] via XSUB, that may save time when a new collator will be constructed. If you want use the compiled DECUT, don't say (table => 'allkeys.txt') nor any other table in Unicode::Collate->new. Though Un

Unicode::Collate

2010-03-20 Thread Neil Shadrach
Unicode::Collate provides a straight-forward mechanizm for modifying the sort order to take into account language-specific variations for example. This is illustrated with the variations required for traditional Spanish I think. Nevertheless I might have expected to see derived modules providing

Re: Unicode::Collate, useful but useless

2007-04-15 Thread Nicholas Clark
CET 5.0.0 with the release of 5.8.9, it could break things for people who have installed Unicode::Collate with 5.8.8 (or earlier) and are currently using DUCET 4.1.0 So it wouldn't be a great idea. Nicholas Clark

Re: Unicode::Collate, useful but useless

2007-04-15 Thread Rafael Garcia-Suarez
2007 +++ perl/MANIFEST Sun Apr 15 17:12:34 2007 @@ -2845,6 +2845,7 @@ lib/Time/localtime.pm By-name interface to Perl's builtin localtime lib/Time/localtime.t Test for Time::localtime lib/Time/tm.pm Internal object for Time::{gm,local}time +

Re: Unicode::Collate, useful but useless

2007-04-15 Thread SADAHIRO Tomoyuki
+2845,7 @@ lib/Time/localtime.pm By-name interface to Perl's builtin localtime lib/Time/localtime.t Test for Time::localtime lib/Time/tm.pm Internal object for Time::{gm,local}time +lib/Unicode/Collate/allkeys.txt Unicode::Collate lib/Unicode/Co

Re: Unicode::Collate, useful but useless

2007-04-15 Thread SADAHIRO Tomoyuki
On 12 Apr 2007 15:36:31 -, Rafael Garcia-Suarez wrote > Éric Cholet wrote in perl.unicode : > > Okay, I know, it wants a Unicode Collation Element Table, it's well > > documented in the pod where to get such a table. > > But: > > - it wants this fi

Re: Unicode::Collate, useful but useless

2007-04-12 Thread Éric Cholet
Le 12 avr. 07 à 17:36, Rafael Garcia-Suarez a écrit : Éric Cholet wrote in perl.unicode : % perl -MUnicode::Collate -e 'Unicode::Collate->new' Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC (@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/li

Re: Unicode::Collate, useful but useless

2007-04-12 Thread Éric Cholet
Le 12 avr. 07 à 22:34, Sébastien Aperghis-Tramoni a écrit : Éric Cholet wrote: % perl -MUnicode::Collate -e 'Unicode::Collate->new' Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC Okay, I know, it wants a Unicode Collation Element Table, it's wel

Re: Unicode::Collate, useful but useless

2007-04-12 Thread Sébastien Aperghis-Tramoni
Éric Cholet wrote: % perl -MUnicode::Collate -e 'Unicode::Collate->new' Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC Okay, I know, it wants a Unicode Collation Element Table, it's well documented in the pod where to get such a table. But: - it w

Re: Unicode::Collate, useful but useless

2007-04-12 Thread Rafael Garcia-Suarez
Éric Cholet wrote in perl.unicode : > % perl -MUnicode::Collate -e 'Unicode::Collate->new' > Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC > (@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/ > perl5/site_perl/5.8.8/mach /usr/l

Unicode::Collate, useful but useless

2007-04-12 Thread Éric Cholet
% perl -MUnicode::Collate -e 'Unicode::Collate->new' Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC (@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/ perl5/site_perl/5.8.8/mach /usr/local/lib/perl5/site_perl/5.8.8 /usr/ local/lib/perl5/si

Re: PAR + Unicode::Collate troubles

2004-09-08 Thread Steve Hay
Bob Hallissy wrote: >(PS: It would be nice if people would remove or at least obfuscate the >original poster's email address when they do a quoted reply.) > I find that when I hit "Reply All" in my mail client (Netscape 7.1) the poster's name in the quoted reply appears as just the name (i.e. wi

Re: PAR + Unicode::Collate troubles

2004-09-08 Thread Bob_Hallissy
On 07/09/2004 18:07:33 Steve Hay wrote: >Use the "-a" option: Bingo! Thanks (and yes, I did have to update my PAR installation) Bob (PS: It would be nice if people would remove or at least obfuscate the original poster's email address when they do a quoted reply.)

Re: PAR + Unicode::Collate troubles

2004-09-07 Thread Steve Hay
[EMAIL PROTECTED] wrote: >Using the Perl Packager (PP), I'm trying to build a PAR-based standalone >EXE that utilizes Unicode::Collate. No problem getting Unicode::Collate >into the package, but that module requires a "keys" file (typically >'allkeys.txt'

PAR + Unicode::Collate troubles

2004-09-07 Thread Bob_Hallissy
Using the Perl Packager (PP), I'm trying to build a PAR-based standalone EXE that utilizes Unicode::Collate. No problem getting Unicode::Collate into the package, but that module requires a "keys" file (typically 'allkeys.txt') to exist in the folder lib/Unicode/Col

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-31 Thread Rich
s, the size of allkeys.txt is an issue - I did a Data dump of a Unicode::Collate instance and it's pretty big! >> 1) >> >> my %collators; >> >> for ( $server_loop ) >> { >>my $lang_tag = Server->requested_lang_tag; >> >>

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-30 Thread SADAHIRO Tomoyuki
On Mon, 29 Mar 2004 23:44:00 +0100 Rich <[EMAIL PROTECTED]> wrote: > I now realise that some per-language tailoring would be needed for sensible > results. Unicode::Collate::Locale seems like the kind of think I was > looking for, and any tailoring is better than none :) >

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-30 Thread Rich
Sadahiro Tomoyuki wrote: > I write Unicode::Collate::Locale (tentatively) for linguistic tailoring > of UCA. To use it, Unicode::Collate should search allkeys.txt > from any directories in @iNC (at present it searchs table files > only under the directory where it locates.)

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-28 Thread Jarkko Hietaniemi
> I think, for a script representing usually one language, > allkeys.txt defines fairly acceptable collation order. > For example, order of hiragana and katakana is approximately > compliant with the custom of the Japanese language. > > In contrast, for a script representing many languages > (say,

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-27 Thread SADAHIRO Tomoyuki
gt; 1) I'll know the preferred language via a RFC2616 language tag. > 2) All data will be utf8 encoded Unicode. > 3) The required language may differ for each request. > > I guess Unicode::Collate is the way to go, so can I simply have one > Unicode::Collate instance per

How to use Unicode::Collate in multilinguage apps?

2004-03-26 Thread Rich
) The required language may differ for each request. I guess Unicode::Collate is the way to go, so can I simply have one Unicode::Collate instance per process using the default allkeys.txt table file? Will that give sensible results for most (all?) languages, or do I need to customise the collat

Re: Unicode::Collate question

2003-12-08 Thread Eric Cholet
/e <<< \u00c6/E" for Spanish: "&N < n\u0303 <<< N\u0303" "&C < ch <<< Ch <<< CH" "&l < ll <<< Ll <<< LL" However Unicode::Collate also allows linguistic tailoring. Cert

Re: Unicode::Collate question

2003-12-06 Thread SADAHIRO Tomoyuki
acter-based and may be more intuitive: for French: "[backwards 2]&A << \u00e6/e <<< \u00c6/E" for Spanish: "&N < n\u0303 <<< N\u0303" "&C < ch <<< Ch <<< CH" "&l < ll <

Re: Unicode::Collate question

2003-12-04 Thread Jarkko Hietaniemi
Has anyone had a look at the OpenI18N/ICU locale data? The locales there are all UTF-8 and have java rule based collation data, so they *might* be useful for creating a more comprehensive (and accurate) set of sort modules? The downside is this data is pretty rough ATM but does seem to be improv

Re: Unicode::Collate question

2003-12-04 Thread Rich
Sadahiro Tomoyuki wrote: > >> So I guess I need a Ligua:XX::Sort module for each language I operate >> on, >> in my original posting I was misled to believe that Unicode::Collate >> would >> be the tool to use. >> >> Thanks to all for the useful li

Re: Unicode::Collate question

2003-12-02 Thread SADAHIRO Tomoyuki
> So I guess I need a Ligua:XX::Sort module for each language I operate > on, > in my original posting I was misled to believe that Unicode::Collate > would > be the tool to use. > > Thanks to all for the useful links provided in this thread. As far as I found, CPAN p

Re: Unicode::Collate question

2003-12-02 Thread Eric Cholet
lar French dictionnary uses, but the link you provide thinks otherwise, precisely because of the "backwards accents" rule, it sorts those words as (ignoring capitalization) cote côte coté côté So I guess I need a Ligua:XX::Sort module for each language I operate on, in my original post

Re: Unicode::Collate question

2003-12-02 Thread Rafael Garcia-Suarez
Eric Cholet wrote in perl.unicode : > > So is it just by chance that these French words are accurately sorted? > > % perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort > qw(côte côté cote coté)' > cote coté côte côté Until recently, spanish dictionaries used to treat 'll' vowel as a

Re: Unicode::Collate question

2003-12-01 Thread Jarkko Hietaniemi
Ok, this is in line with what how I understood this paragraph in perluniintro: The short answer is that by default, Perl compares strings ("lt", "le", "cmp", "ge", "gt") based only on the code points of the char- acters. In the above case, the answer is "aft

Re: Unicode::Collate question

2003-12-01 Thread Eric Cholet
Le 1 dÃc. 03, Ã 16:46, Jarkko Hietaniemi a Ãcrit : Thank you both for your replies. What about sorting words in one particular language, is Perl's sort() good enough? I'm wondering, since language isn't one of sort()'s arguments. First we need to define "good enough"... again, if you are sorting

Re: Unicode::Collate question

2003-12-01 Thread Jarkko Hietaniemi
Thank you both for your replies. What about sorting words in one particular language, is Perl's sort() good enough? I'm wondering, since language isn't one of sort()'s arguments. First we need to define "good enough"... again, if you are sorting "simple" English or Hawaiian, you are probably fine

Re: Unicode::Collate question

2003-12-01 Thread Eric Cholet
Le 29 nov. 03, à 16:30, Jarkko.Hietaniemi a écrit : I want to correctly sort words in a variety of languages, currently French, English, Spanish, Portuguese, German and Arabic. I am using Perl 5.8.1 and unicode. I think I need Unicode::Collate to have *correct* sorting. Is this correct? In

RE: Unicode::Collate question

2003-11-30 Thread Edward Batutis
> -Original Message- > From: Jarkko.Hietaniemi [mailto:[EMAIL PROTECTED] ... > the UCA is not "correct" for any particular language ... Not by design, no, but it fine for English and Italian, for example. > I think it is worth pointing out that trying to sort multilingual > data is pra

Re: Unicode::Collate question

2003-11-29 Thread Jarkko . Hietaniemi
I want to correctly sort words in a variety of languages, currently French, English, Spanish, Portuguese, German and Arabic. I am using Perl 5.8.1 and unicode. I think I need Unicode::Collate to have *correct* sorting. Is this correct? In addition to the problems listed by Sadahiro (most

Re: Unicode::Collate question

2003-11-29 Thread SADAHIRO Tomoyuki
[excuse me, I sent cc to [EMAIL PROTECTED]; I expect some helps and/or suggestions may be given there] > Greetings, > > I hope you won't mind a few questions related to your module > Unicode::Collate. > > I want to correctly sort words in a variety of languages, curr

[ANN] Unicode::Collate 0.28 released

2003-09-07 Thread SADAHIRO Tomoyuki
Hello, Unicode::Collate 0.28 is released. (0.27 was released only last week...) It is available from CPAN: http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.28/ Changes against v0.27 are: - Fixed another inconsistency under (normalization => undef): Non-contiguous contraction

[ANN] Unicode::Collate 0.27 released

2003-09-01 Thread SADAHIRO Tomoyuki
Hello, Unicode::Collate 0.27 is released. It is available from CPAN: http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.27/ Changes against v0.26 are: - The maximum length of contracted CE was not checked. Collation of a large string including a first letter of a contraction that

[ANN] Unicode::Normalize 0.25 and Unicode::Collate 0.22 released

2003-06-09 Thread SADAHIRO Tomoyuki
Hello, all. This update should fix internal functions to convert Unicode codepoints vs Unicode characters [named pack_U() and unpack_U()]. In EBCDIC boxes, unpack_U() seems need rewriting. Testing is welcome. http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.25/ http://search.cpan.org

Re: Unicode::Collate 0.23 Released

2002-09-08 Thread hv
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: :Unicode::Collate 0.23 is released. : :Changes between 0.21 -> 0.23 are: : :0.23 Wed Sep 04 19:25:20 2002 :- fix: scalar match() no longer returns an lvalue substr ref. :- fix: "Ignorable after variable" should be mad

Re: Unicode::Collate 0.23 Released

2002-09-05 Thread Nicholas Clark
On Thu, Sep 05, 2002 at 08:36:50AM -0600, Mark Leisher wrote: > > Tomoyuki> Unicode::Collate 0.23 is released. > > Could you remind us where to find it again? Thanks! I can find it on CPAN: http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.23/ (start at search.

Re: Unicode::Collate 0.23 Released

2002-09-05 Thread SADAHIRO Tomoyuki
On Thu, 5 Sep 2002 08:36:50 -0600 (MDT) Mark Leisher <[EMAIL PROTECTED]> wrote: > > Tomoyuki> Unicode::Collate 0.23 is released. > > Could you remind us where to find it again? Thanks! Oh, sorry. CPAN distributes it. http://search.cpan.org/author/SADAHIRO

Re: Unicode::Collate 0.23 Released

2002-09-05 Thread Mark Leisher
Tomoyuki> Unicode::Collate 0.23 is released. Could you remind us where to find it again? Thanks! - Mark Leisher Computing Research LabThe mountain remains unmoved at New Mexico State Univers

Unicode::Collate 0.23 Released

2002-09-05 Thread SADAHIRO Tomoyuki
Hi, all. Unicode::Collate 0.23 is released. Changes between 0.21 -> 0.23 are: 0.23 Wed Sep 04 19:25:20 2002 - fix: scalar match() no longer returns an lvalue substr ref. - fix: "Ignorable after variable" should be made level 3 ignorable even if alternat

Re: [Announce] Unicode::Collate 0.21

2002-08-22 Thread hv
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote: :Unicode::Collate 0.21 is uploaded to CPAN. : :* Some tests are added for UCA version 9. :* "keys.txt" is based on allkeys-3.1.1.txt, on Unicode 3.1.1. : The size of "keys.txt" is reduced about to half. Thanks, belatedly app

[Announce] Unicode::Collate 0.21

2002-08-03 Thread SADAHIRO Tomoyuki
Hello, all. Unicode::Collate 0.21 is uploaded to CPAN. * Some tests are added for UCA version 9. * "keys.txt" is based on allkeys-3.1.1.txt, on Unicode 3.1.1. The size of "keys.txt" is reduced about to half. Regards, SADAHIRO Tomoyuki

[Announce] Unicode::Collate 0.20 -> UCA version 9

2002-07-25 Thread SADAHIRO Tomoyuki
Hello, all. Unicode::Collate 0.20 is uploaded onto CPAN, and available from CPAN, and http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-0.20.tar.gz The diff. from the lastest version 0.12 are: * UCA version 9 is supported. (cf. http://www.unicode.org/reports/tr10/) * A new

Unicode-Collate-0.08

2001-08-21 Thread SADAHIRO Tomoyuki
Hello, everyone. Now Unicode::Collate 0.08 is available from CPAN. http://search.cpan.org/search?dist=Unicode-Collate-0.08 new method: index() $position = $UCA->index($string, $substring); ($position, $length) = $UCA->index($string, $substring); -- see 6.8 Searching, U