Re: Unicode::Collate question

2003-12-01 Thread Eric Cholet
ail sooner or later because many language-specific rules simply are contradictory. Thank you both for your replies. What about sorting words in one particular language, is Perl's sort() good enough? I'm wondering, since language isn't one of sort()'s arguments. -- Eric Cholet

Re: Unicode::Collate question

2003-12-01 Thread Eric Cholet
x00C1 > 0x00C0. So is it just by chance that these French words are accurately sorted? % perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort qw(cÃte cÃtà cote cotÃ)' cote cotà cÃte cÃtà Thanks, -- Eric Cholet

Re: Unicode::Collate question

2003-12-02 Thread Eric Cholet
ing I was misled to believe that Unicode::Collate would be the tool to use. Thanks to all for the useful links provided in this thread. -- Eric Cholet

Re: Unicode::Collate question

2003-12-08 Thread Eric Cholet
ainly its interface requires hard code of weights and may be less user-friendly. #!perl use strict; use warnings; use Unicode::Collate; [snip] That rocks. It is the answer to my original post which started this thread! Thank you much. -- Eric Cholet

removing accents

2003-12-27 Thread Eric Cholet
Here's another naive question from a unicode newbie: Is there a way, using perl's unicode support, to remove accents from a string? I looked at \pM but can't figure out how it works, I wasn't able to match anything with it. Thanks, -- Eric Cholet

\W and [\W]

2003-12-31 Thread Eric Cholet
(\W+)/; print '2 ', $x =~ /([\W]+)/; print '3 ', $x =~ /(\w+)/; ...prints: 1 2 Ã 3 GroÃbritannien I do not understand why the Eszett matches [\W] in #2. Same behavior if I replace the Eszett with another, non ASCII, "letter", e.g. "Ã". -- Eric Cholet

Re: \W and [\W]

2003-12-31 Thread Eric Cholet
decode("iso-8859-1", "GroÃbritannien"); ...which yields the same results of course: 1 2 Ã 3 GroÃbritannien -- #!/usr/bin/perl -w use strict; use encoding 'utf8'; my $x = 'GroÃbritannien'; $\ = "\n"; print '1 ', $x =~ /(\W+)/; print '2 ', $x =~ /([\W]+)/; print '3 ', $x =~ /(\w+)/; exit(0); -- Eric Cholet

Re: \W and [\W]

2004-01-01 Thread Eric Cholet
just fine, as shown in Andreas' bug report. -- Eric Cholet

Re: removing accents

2004-01-02 Thread Eric Cholet
Le 28 déc. 03, à 04:45, SADAHIRO Tomoyuki a écrit : On Sat, 27 Dec 2003 13:30:19 +0100 Eric Cholet <[EMAIL PROTECTED]> wrote: Here's another naive question from a unicode newbie: Is there a way, using perl's unicode support, to remove accents from a string? I looked at \pM but

Re: removing accents

2004-01-03 Thread Eric Cholet
locale dependant. I reverted back to my carefully crafted tr()s... Incidentally much faster than the Unicode::Normalize / remove \pM approach. -- Eric Cholet

Re: need help

2004-01-14 Thread Eric Cholet
ailed docs that will put you on track. Worked for me! And then if you have specific problems with some code you will get good answers from this list, I know I did. -- Eric Cholet

Re: httpi support

2004-01-22 Thread Eric Cholet
___ Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus -- Eric Cholet

Re: Creating a UTF-8 web page

2004-04-07 Thread Eric Cholet
you have any idea how can I create a page like Google's? Thank you. Teddy -- Eric Cholet

Re: Creating a UTF-8 web page

2004-04-08 Thread Eric Cholet
to the desired encoding : print Encode::encode('utf8', $s); -- Eric Cholet

Re: Segfault using HTML::Entities

2004-06-30 Thread Eric Cholet
tities are a way to encode non ASCII characters into an ASCII representation-- this is orthogonal to the XML document's encoding or the XML parser's output encoding. -- Eric Cholet

utf8::SWASHNEW

2005-04-29 Thread Eric Cholet
me a headache). Is this a price to pay when using Perl unicode strings? -- Eric Cholet