On Sun, 25 Nov 2007, Ivan Bogdanov wrote:

Hi,

I have some problems with transliteration from Cyrillic text into
Latin.
I my mind, i have two ways to solve the problem:
1) using a tr/// operator, but it not the best way i think, because in
Russian it might be one symbol and in transit it would be two
symbols.
2) using two arrays, something like this:
   my @cstring=qw(? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ?);
   my @lstring=qw(a b v g d e yo g z i y k l m n o p r s t u f x c ch
sh shch ' yi ' e yu ya);

and then substitute each symbol. And here i have  a problem.
Ohhh, i have a string like this - $string = "????? ? ?", and i need to
convert it to $string = "Popov_PP". Can anybody help me with this?



Hi Ivan,

Although this is really not about perl-ldap, and since perl-ldap is such a low volume list, I'll answer and hope no one will mind. Krome tovo, mne nravitsya russkiy yazik!

sub translit() {
    # hopefully formatted to avoid text wrapping in email clients

    my $input_string = shift @_;
    my $output_string = '';

    # We'll transliterate all spaces to underscores, then get rid
    # of the extra underscore at the end.  Note this presumes that
    # input will always follow the format you describe above.  This
    # handles space chars *only*!  Other whitespace chars (\t, etc.)
    # will be skipped and will generate a warning.  This sub is
    # specific to the described problem and is *not* a generic
    # transliterate function, although it would be trivial to convert
    # it to one.

    # The following should be the actual Cyrillic chars that
    # my keyboard can't type.
    my @cstring = qw(? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
                     ? ? ? ? ? ? ? ? ? ?);
    # Add our space char to the Cyrillic array here, since I can't
    # quickly think of a way to make 'qw' handle it.
    push ' ', @cstring;

    # note added '_' char at end of Latin array!
    my @lstring = qw(a b v g d e yo g z i y k l m n o p r s t u f
                     x c ch sh shch ' yi ' e yu ya _);

    for (my $i = 0; $i < length($input_string); $i++) {
        my $char = substr($input_string, $i, 1);

        # don't transliterate numbers
        if ($char =~ /\d/) {
            # don't use '.=' for Perl 6 compatibility
            $output_string = "$output_string$char";
            next;
        }

        my $found = 0;

        for (my $j = 0; $j < scalar(@cstring); $j++) {
            if ($char eq $cstring[$j]) {
                $output_string = "$output_string$lstring[$j]";
                $found = 1;
            }
        }

        # Give a warning if we're missing a char from @cstring,
        # or if input contained a bad character.
        unless ($found) {
            my $msg = sprintf("%s%s%s%s", 'translit() skipping ',
                                          'unknown character ',
                                          $char,
                                          ' in input string');
            warn($msg);
        }
    }

    $output_string =~ s/(\w+_\w+)_(\w+)/$1$2/;

    return $output_string;
}


You really should ask something like this on a more general perl list, though.

Da zdravstvuet Perl!

--
Craig Dunigan
IS Technical Services Specialist
Middleware - EIS - DoIT
University of Wisconsin, Madison

opinions expressed are my own, not the University's

Reply via email to