subject:"\"Removing Accents\" from unicode strings"

Re: Removing Accents from unicode strings

2005-10-31 Thread David Graff


[EMAIL PROTECTED] said:
 I need to convert strings obtained from a mysql database in utf8 format
 into a fileformat to be uploaded to specific hardware (specifically
 GPS's).  Some of these formats may only allow unaccented characters, so I
 need a way to convert accented characters into their respective base
 characters, g.e. unicode '�' into ASCII 'o', '�' into 'a' and so  forth.

 Is there an easy way to do this in Perl? 

There's a prior thread on this list about this very topic:

http://www.mail-archive.com/perl-unicode@perl.org/msg02000.html

Also, I've posted a couple different approaches on www.perlmonks.org -- 
here's my favorite:

#!/usr/bin/perl -CDS

use strict;
require 5.008;

my @charnames = grep /\tLATIN \S+ LETTER/, split( /^/, do 'unicore/Name.pl' );

my %accents;

for my $c ( split //, qq/AEIOUCNYaeioucny/ ) {
my $case = ( $c eq lc $c ) ?  'SMALL' : 'CAPITAL';
$accents{$c} =
  join( '', map { chr hex( substr $_, 0, 4 ) }
grep /\tLATIN $case LETTER \U$c WITH/, @charnames );
}

# now use each element of %accents as a character class:

while () {
for my $c ( keys %accents ) {
s/[$accents{$c}]/$c/g;
}
print;
}

__END__

Another way would be to simply hard-code a set of tr/..././ steps, one
for each lower-case and upper-case unaccented letter (placed on the right),
with all its accented variants on the left.  Tedious to code, but very fast
at run-time.

Dave Graff

Removing Accents from unicode strings

2005-10-30 Thread Agnar Renolen

Hello,

I need to convert strings obtained from a mysql database in utf8 format
into a fileformat to be uploaded to specific hardware (specifically
GPS's).  Some of these formats may only allow unaccented characters, so
I need a way to convert accented characters into their respective base
characters, g.e. unicode 'ó' into ASCII 'o', 'ä' into 'a' and so  forth.

Is there an easy way to do this in Perl?

Agnar Renolen
Trondheim, NORWAY

Re: Removing Accents from unicode strings

Removing Accents from unicode strings

2 matches

Site Navigation

Mail list logo

Footer information