I am trying to build a Unicode-based transliteration table from Cyrillic to 
7-bit ASCII and would like to request the assistance of the Unicode list 
members.

The goal is to improve an existing program I wrote which automatically 
detects the encoding form of Cyrillic text (8-bit character sets such as DOS 
CP 866, Windows CP 1251, or KOI-8, as well as UTF-8) and optionally 
transliterates the text to a 7-bit ASCII representation that an English 
speaker can reasonably sound out.

Lots of Cyrillic-Latin transliterations are available for the Russian 
alphabet.  I am looking for one that targets only the 7-bit ASCII set, which 
rules out ISO 9, and covers more than just Russian, which rules out many 
others.  One-to-one correspondence between letters is not a goal; it is 
perfectly OK to transliterate U+0429 to "SHCH".  Likewise, round-tripping is 
not a goal; U+0428 + U+0427 and U+0429 would both be expected to map to 
"SHCH".

What I do want is something that generates a usable pronunciation without 
using digits or letters like Q for no purpose other than uniqueness, and 
which is based on Unicode values and addresses as much of the Unicode 
Cyrillic block as possible, including the new characters planned for Unicode 
3.2 if possible.

I know that neither UTC nor WG2 engages in the very controversial business of 
assigning canonical transliterations between scripts, and I am not asking 
them to.  I would like the private assistance of capable list members in 
providing an unofficial solution.

Part of a possible transliteration table might look like the following:

    U+0410      A
    U+0411      B
    U+0412      V
    U+0413      G
    U+0414      D
    ...
    U+0430      a
    U+0431      b
    U+0432      v
    U+0433      g
    U+0434      d
    ...

If anyone would like to help, please e-mail me privately at [EMAIL PROTECTED]
, or you can write to the list if you feel your response would be of interest 
to the list at large.

Thanks,

-Doug Ewell
 Fullerton, California

Reply via email to