On Sun, 25 Nov 2007, Ivan Bogdanov wrote:
Hi,
I have some problems with transliteration from Cyrillic text into
Latin.
I my mind, i have two ways to solve the problem:
1) using a tr/// operator, but it not the best way i think, because in
Russian it might be one symbol and in transit it would be two
symbols.
2) using two arrays, something like this:
my @cstring=qw(? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?);
my @lstring=qw(a b v g d e yo g z i y k l m n o p r s t u f x c ch
sh shch ' yi ' e yu ya);
and then substitute each symbol. And here i have a problem.
Ohhh, i have a string like this - $string = "????? ? ?", and i need to
convert it to $string = "Popov_PP". Can anybody help me with this?
Hi Ivan,
Although this is really not about perl-ldap, and since perl-ldap is
such a low volume list, I'll answer and hope no one will mind. Krome
tovo, mne nravitsya russkiy yazik!
sub translit() {
# hopefully formatted to avoid text wrapping in email clients
my $input_string = shift @_;
my $output_string = '';
# We'll transliterate all spaces to underscores, then get rid
# of the extra underscore at the end. Note this presumes that
# input will always follow the format you describe above. This
# handles space chars *only*! Other whitespace chars (\t, etc.)
# will be skipped and will generate a warning. This sub is
# specific to the described problem and is *not* a generic
# transliterate function, although it would be trivial to convert
# it to one.
# The following should be the actual Cyrillic chars that
# my keyboard can't type.
my @cstring = qw(? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?);
# Add our space char to the Cyrillic array here, since I can't
# quickly think of a way to make 'qw' handle it.
push ' ', @cstring;
# note added '_' char at end of Latin array!
my @lstring = qw(a b v g d e yo g z i y k l m n o p r s t u f
x c ch sh shch ' yi ' e yu ya _);
for (my $i = 0; $i < length($input_string); $i++) {
my $char = substr($input_string, $i, 1);
# don't transliterate numbers
if ($char =~ /\d/) {
# don't use '.=' for Perl 6 compatibility
$output_string = "$output_string$char";
next;
}
my $found = 0;
for (my $j = 0; $j < scalar(@cstring); $j++) {
if ($char eq $cstring[$j]) {
$output_string = "$output_string$lstring[$j]";
$found = 1;
}
}
# Give a warning if we're missing a char from @cstring,
# or if input contained a bad character.
unless ($found) {
my $msg = sprintf("%s%s%s%s", 'translit() skipping ',
'unknown character ',
$char,
' in input string');
warn($msg);
}
}
$output_string =~ s/(\w+_\w+)_(\w+)/$1$2/;
return $output_string;
}
You really should ask something like this on a more general perl list,
though.
Da zdravstvuet Perl!
--
Craig Dunigan
IS Technical Services Specialist
Middleware - EIS - DoIT
University of Wisconsin, Madison
opinions expressed are my own, not the University's