On Thu, 28 Apr 2011 10:06:58 -0700 (PDT)
Frank Müller wrote:
> dear all,
> I'm trying to do some string replacements with Unicode::Collate which
> usually work very well, but these replacements seem to be case
> insensitive by default - how can I change this? look at this
dear all,
I'm trying to do some string replacements with Unicode::Collate which
usually work very well, but these replacements seem to be case
insensitive by default - how can I change this? look at this simple
example:
my $myCollator = Unicode::Collate->new( normalization => undef,
.@wt if $to_be_pushed;
> - } else {
> + } elsif ($to_be_pushed) {
> push @subWt, [ \...@wt ];
> }
> }
>
> Regards,
> SADAHIRO Tomoyuki
>
> > dear all,
> > most probably I'm missing something quite obvious and very simple,
>
code yet.
> I'm making some string replacements with Unicode::Collate
> which generally works fine but for whitespace. I have the following
> simple code (adopted from the module documentation):
>
> my $myCollator = Unicode::Collate->new( normalization => undef, level => 1 )
Good Job Sadahiro!
- Ravi Sastry Kadali
On Mon, Jul 26, 2010 at 7:35 PM, SADAHIRO Tomoyuki wrote:
> Hello, all.
>
> Unicode::Collate 0.54 [1] supports a C-compiled DECUT [2],[3] via XSUB,
> that may save time when a new collator will be constructed.
>
> If you want use the com
Hello, all.
Unicode::Collate 0.54 [1] supports a C-compiled DECUT [2],[3] via XSUB,
that may save time when a new collator will be constructed.
If you want use the compiled DECUT, don't say (table => 'allkeys.txt')
nor any other table in Unicode::Collate->new.
Though Un
Unicode::Collate provides a straight-forward mechanizm for modifying the
sort order to take into account language-specific variations for example.
This is illustrated with the variations required for traditional Spanish I
think. Nevertheless I might have expected to see derived modules providing
CET 5.0.0
with the release of 5.8.9, it could break things for people who have installed
Unicode::Collate with 5.8.8 (or earlier) and are currently using DUCET 4.1.0
So it wouldn't be a great idea.
Nicholas Clark
2007
+++ perl/MANIFEST Sun Apr 15 17:12:34 2007
@@ -2845,6 +2845,7 @@
lib/Time/localtime.pm By-name interface to Perl's builtin localtime
lib/Time/localtime.t Test for Time::localtime
lib/Time/tm.pm Internal object for Time::{gm,local}time
+
+2845,7 @@
lib/Time/localtime.pm By-name interface to Perl's builtin localtime
lib/Time/localtime.t Test for Time::localtime
lib/Time/tm.pm Internal object for Time::{gm,local}time
+lib/Unicode/Collate/allkeys.txt Unicode::Collate
lib/Unicode/Co
On 12 Apr 2007 15:36:31 -, Rafael Garcia-Suarez wrote
> Éric Cholet wrote in perl.unicode :
> > Okay, I know, it wants a Unicode Collation Element Table, it's well
> > documented in the pod where to get such a table.
> > But:
> > - it wants this fi
Le 12 avr. 07 à 17:36, Rafael Garcia-Suarez a écrit :
Éric Cholet wrote in perl.unicode :
% perl -MUnicode::Collate -e 'Unicode::Collate->new'
Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC
(@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/li
Le 12 avr. 07 à 22:34, Sébastien Aperghis-Tramoni a écrit :
Éric Cholet wrote:
% perl -MUnicode::Collate -e 'Unicode::Collate->new'
Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC
Okay, I know, it wants a Unicode Collation Element Table, it's
wel
Éric Cholet wrote:
% perl -MUnicode::Collate -e 'Unicode::Collate->new'
Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC
Okay, I know, it wants a Unicode Collation Element Table, it's well
documented in the pod where to get such a table.
But:
- it w
Éric Cholet wrote in perl.unicode :
> % perl -MUnicode::Collate -e 'Unicode::Collate->new'
> Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC
> (@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/
> perl5/site_perl/5.8.8/mach /usr/l
% perl -MUnicode::Collate -e 'Unicode::Collate->new'
Unicode::Collate: Can't locate Unicode/Collate/allkeys.txt in @INC
(@INC contains: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/
perl5/site_perl/5.8.8/mach /usr/local/lib/perl5/site_perl/5.8.8 /usr/
local/lib/perl5/si
Bob Hallissy wrote:
>(PS: It would be nice if people would remove or at least obfuscate the
>original poster's email address when they do a quoted reply.)
>
I find that when I hit "Reply All" in my mail client (Netscape 7.1) the
poster's name in the quoted reply appears as just the name (i.e. wi
On 07/09/2004 18:07:33 Steve Hay wrote:
>Use the "-a" option:
Bingo! Thanks (and yes, I did have to update my PAR installation)
Bob
(PS: It would be nice if people would remove or at least obfuscate the
original poster's email address when they do a quoted reply.)
[EMAIL PROTECTED] wrote:
>Using the Perl Packager (PP), I'm trying to build a PAR-based standalone
>EXE that utilizes Unicode::Collate. No problem getting Unicode::Collate
>into the package, but that module requires a "keys" file (typically
>'allkeys.txt'
Using the Perl Packager (PP), I'm trying to build a PAR-based standalone
EXE that utilizes Unicode::Collate. No problem getting Unicode::Collate
into the package, but that module requires a "keys" file (typically
'allkeys.txt') to exist in the folder lib/Unicode/Col
s, the size of allkeys.txt is an issue - I did a Data dump of a
Unicode::Collate instance and it's pretty big!
>> 1)
>>
>> my %collators;
>>
>> for ( $server_loop )
>> {
>>my $lang_tag = Server->requested_lang_tag;
>>
>>
On Mon, 29 Mar 2004 23:44:00 +0100
Rich <[EMAIL PROTECTED]> wrote:
> I now realise that some per-language tailoring would be needed for sensible
> results. Unicode::Collate::Locale seems like the kind of think I was
> looking for, and any tailoring is better than none :)
>
Sadahiro Tomoyuki wrote:
> I write Unicode::Collate::Locale (tentatively) for linguistic tailoring
> of UCA. To use it, Unicode::Collate should search allkeys.txt
> from any directories in @iNC (at present it searchs table files
> only under the directory where it locates.)
> I think, for a script representing usually one language,
> allkeys.txt defines fairly acceptable collation order.
> For example, order of hiragana and katakana is approximately
> compliant with the custom of the Japanese language.
>
> In contrast, for a script representing many languages
> (say,
gt; 1) I'll know the preferred language via a RFC2616 language tag.
> 2) All data will be utf8 encoded Unicode.
> 3) The required language may differ for each request.
>
> I guess Unicode::Collate is the way to go, so can I simply have one
> Unicode::Collate instance per
) The required language may differ for each request.
I guess Unicode::Collate is the way to go, so can I simply have one
Unicode::Collate instance per process using the default allkeys.txt table
file?
Will that give sensible results for most (all?) languages, or do I need to
customise the collat
/e <<< \u00c6/E"
for Spanish:
"&N < n\u0303 <<< N\u0303"
"&C < ch <<< Ch <<< CH"
"&l < ll <<< Ll <<< LL"
However Unicode::Collate also allows linguistic tailoring.
Cert
acter-based and may be more intuitive:
for French:
"[backwards 2]&A << \u00e6/e <<< \u00c6/E"
for Spanish:
"&N < n\u0303 <<< N\u0303"
"&C < ch <<< Ch <<< CH"
"&l < ll <
Has anyone had a look at the OpenI18N/ICU locale data?
The locales there are all UTF-8 and have java rule based collation
data, so
they *might* be useful for creating a more comprehensive (and
accurate) set
of sort modules? The downside is this data is pretty rough ATM but does
seem to be improv
Sadahiro Tomoyuki wrote:
>
>> So I guess I need a Ligua:XX::Sort module for each language I operate
>> on,
>> in my original posting I was misled to believe that Unicode::Collate
>> would
>> be the tool to use.
>>
>> Thanks to all for the useful li
> So I guess I need a Ligua:XX::Sort module for each language I operate
> on,
> in my original posting I was misled to believe that Unicode::Collate
> would
> be the tool to use.
>
> Thanks to all for the useful links provided in this thread.
As far as I found, CPAN p
lar French dictionnary
uses,
but the link you provide thinks otherwise, precisely because of the
"backwards accents" rule, it sorts those words as (ignoring
capitalization)
cote côte coté côté
So I guess I need a Ligua:XX::Sort module for each language I operate
on,
in my original post
Eric Cholet wrote in perl.unicode :
>
> So is it just by chance that these French words are accurately sorted?
>
> % perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort
> qw(côte côté cote coté)'
> cote coté côte côté
Until recently, spanish dictionaries used to treat 'll' vowel
as a
Ok, this is in line with what how I understood this paragraph in
perluniintro:
The short answer is that by default, Perl compares strings
("lt",
"le", "cmp", "ge", "gt") based only on the code points of
the char-
acters. In the above case, the answer is "aft
Le 1 dÃc. 03, Ã 16:46, Jarkko Hietaniemi a Ãcrit :
Thank you both for your replies. What about sorting words in one
particular
language, is Perl's sort() good enough? I'm wondering, since language
isn't
one of sort()'s arguments.
First we need to define "good enough"... again, if you are sorting
Thank you both for your replies. What about sorting words in one
particular
language, is Perl's sort() good enough? I'm wondering, since language
isn't
one of sort()'s arguments.
First we need to define "good enough"... again, if you are sorting
"simple" English or Hawaiian, you are probably fine
Le 29 nov. 03, à 16:30, Jarkko.Hietaniemi a écrit :
I want to correctly sort words in a variety of languages, currently
French, English, Spanish, Portuguese, German and Arabic. I am using
Perl 5.8.1 and unicode. I think I need Unicode::Collate to have
*correct* sorting. Is this correct?
In
> -Original Message-
> From: Jarkko.Hietaniemi [mailto:[EMAIL PROTECTED]
...
> the UCA is not "correct" for any particular language ...
Not by design, no, but it fine for English and Italian, for example.
> I think it is worth pointing out that trying to sort multilingual
> data is pra
I want to correctly sort words in a variety of languages, currently
French, English, Spanish, Portuguese, German and Arabic. I am using
Perl 5.8.1 and unicode. I think I need Unicode::Collate to have
*correct* sorting. Is this correct?
In addition to the problems listed by Sadahiro (most
[excuse me, I sent cc to [EMAIL PROTECTED];
I expect some helps and/or suggestions may be given there]
> Greetings,
>
> I hope you won't mind a few questions related to your module
> Unicode::Collate.
>
> I want to correctly sort words in a variety of languages, curr
Hello, Unicode::Collate 0.28 is released.
(0.27 was released only last week...)
It is available from CPAN:
http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.28/
Changes against v0.27 are:
- Fixed another inconsistency under (normalization => undef):
Non-contiguous contraction
Hello, Unicode::Collate 0.27 is released.
It is available from CPAN:
http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.27/
Changes against v0.26 are:
- The maximum length of contracted CE was not checked.
Collation of a large string including a first letter of
a contraction that
Hello, all.
This update should fix internal functions
to convert Unicode codepoints vs Unicode characters
[named pack_U() and unpack_U()].
In EBCDIC boxes, unpack_U() seems need rewriting.
Testing is welcome.
http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.25/
http://search.cpan.org
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
:Unicode::Collate 0.23 is released.
:
:Changes between 0.21 -> 0.23 are:
:
:0.23 Wed Sep 04 19:25:20 2002
:- fix: scalar match() no longer returns an lvalue substr ref.
:- fix: "Ignorable after variable" should be mad
On Thu, Sep 05, 2002 at 08:36:50AM -0600, Mark Leisher wrote:
>
> Tomoyuki> Unicode::Collate 0.23 is released.
>
> Could you remind us where to find it again? Thanks!
I can find it on CPAN:
http://search.cpan.org/author/SADAHIRO/Unicode-Collate-0.23/
(start at search.
On Thu, 5 Sep 2002 08:36:50 -0600 (MDT)
Mark Leisher <[EMAIL PROTECTED]> wrote:
>
> Tomoyuki> Unicode::Collate 0.23 is released.
>
> Could you remind us where to find it again? Thanks!
Oh, sorry. CPAN distributes it.
http://search.cpan.org/author/SADAHIRO
Tomoyuki> Unicode::Collate 0.23 is released.
Could you remind us where to find it again? Thanks!
-
Mark Leisher
Computing Research LabThe mountain remains unmoved at
New Mexico State Univers
Hi, all.
Unicode::Collate 0.23 is released.
Changes between 0.21 -> 0.23 are:
0.23 Wed Sep 04 19:25:20 2002
- fix: scalar match() no longer returns an lvalue substr ref.
- fix: "Ignorable after variable" should be made level 3 ignorable
even if alternat
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
:Unicode::Collate 0.21 is uploaded to CPAN.
:
:* Some tests are added for UCA version 9.
:* "keys.txt" is based on allkeys-3.1.1.txt, on Unicode 3.1.1.
: The size of "keys.txt" is reduced about to half.
Thanks, belatedly app
Hello, all.
Unicode::Collate 0.21 is uploaded to CPAN.
* Some tests are added for UCA version 9.
* "keys.txt" is based on allkeys-3.1.1.txt, on Unicode 3.1.1.
The size of "keys.txt" is reduced about to half.
Regards,
SADAHIRO Tomoyuki
Hello, all.
Unicode::Collate 0.20 is uploaded onto CPAN,
and available from CPAN,
and
http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-0.20.tar.gz
The diff. from the lastest version 0.12 are:
* UCA version 9 is supported.
(cf. http://www.unicode.org/reports/tr10/)
* A new
Hello, everyone.
Now Unicode::Collate 0.08 is available from CPAN.
http://search.cpan.org/search?dist=Unicode-Collate-0.08
new method: index()
$position = $UCA->index($string, $substring);
($position, $length) = $UCA->index($string, $substring);
-- see 6.8 Searching, U
52 matches
Mail list logo