ail sooner
or later
because many language-specific rules simply are contradictory.
Thank you both for your replies. What about sorting words in one
particular
language, is Perl's sort() good enough? I'm wondering, since language
isn't
one of sort()'s arguments.
--
Eric Cholet
x00C1 >
0x00C0.
So is it just by chance that these French words are accurately sorted?
% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort
qw(cÃte cÃtà cote cotÃ)'
cote cotà cÃte cÃtÃ
Thanks,
--
Eric Cholet
ing I was misled to believe that Unicode::Collate
would
be the tool to use.
Thanks to all for the useful links provided in this thread.
--
Eric Cholet
ainly its interface requires hard code of weights and
may be less user-friendly.
#!perl
use strict;
use warnings;
use Unicode::Collate;
[snip]
That rocks. It is the answer to my original post which started this
thread!
Thank you much.
--
Eric Cholet
Here's another naive question from a unicode newbie:
Is there a way, using perl's unicode support, to remove
accents from a string? I looked at \pM but can't figure
out how it works, I wasn't able to match anything with it.
Thanks,
--
Eric Cholet
(\W+)/;
print '2 ', $x =~ /([\W]+)/;
print '3 ', $x =~ /(\w+)/;
...prints:
1
2 Ã
3 GroÃbritannien
I do not understand why the Eszett matches [\W] in #2. Same behavior
if I replace the Eszett with another, non ASCII, "letter", e.g. "Ã".
--
Eric Cholet
decode("iso-8859-1", "GroÃbritannien");
...which yields the same results of course:
1
2 Ã
3 GroÃbritannien
--
#!/usr/bin/perl -w
use strict;
use encoding 'utf8';
my $x = 'GroÃbritannien';
$\ = "\n";
print '1 ', $x =~ /(\W+)/;
print '2 ', $x =~ /([\W]+)/;
print '3 ', $x =~ /(\w+)/;
exit(0);
--
Eric Cholet
just fine,
as shown in Andreas' bug report.
--
Eric Cholet
Le 28 déc. 03, à 04:45, SADAHIRO Tomoyuki a écrit :
On Sat, 27 Dec 2003 13:30:19 +0100
Eric Cholet <[EMAIL PROTECTED]> wrote:
Here's another naive question from a unicode newbie:
Is there a way, using perl's unicode support, to remove
accents from a string? I looked at \pM but
locale dependant.
I reverted back to my carefully crafted tr()s... Incidentally
much faster than the Unicode::Normalize / remove \pM approach.
--
Eric Cholet
ailed docs that will
put you on track. Worked for me! And then if you have specific
problems with some code you will get good answers from
this list, I know I did.
--
Eric Cholet
___
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus
--
Eric Cholet
you have any idea how can I create a page like Google's?
Thank you.
Teddy
--
Eric Cholet
to the desired encoding :
print Encode::encode('utf8', $s);
--
Eric Cholet
tities are a way to encode non ASCII characters
into an ASCII representation-- this is orthogonal to the XML document's
encoding or the XML parser's output encoding.
--
Eric Cholet
me a headache). Is this a price to pay when using Perl
unicode strings?
--
Eric Cholet
16 matches
Mail list logo