Re attached: The machine readable section in the passports should be seen as a code rather than text.
The rules for this section originate from the OCR-B standard with a highly limited character repertoire to ensure reliable scanning. The travel documents contain the name in the original script (e.g., Cyrillic) plus the transliterated name in the Latin script (if the original script is different) plus the machine readable section produced in OCR-B. The transliteration practices may and do change from time to time; e.g., Russian passports used to be transliterated to French as the target language, whereas now the target language is English. Also, several of the former Soviet Union countries have recently introduced their own transliteration schemes from Cyrillic to Latin. Actually, the original question addresses fall backs rather than transliteration. Sincerely, Erkki I. Kolehmainen -----Alkuperäinen viesti----- Lähettäjä: [email protected] [mailto:[email protected]] Puolesta Ilya Zakharevich Lähetetty: 2. marraskuuta 2013 01:34 Vastaanottaja: Jukka K. Korpela Kopio: [email protected] Aihe: Re: How to remove accents while conforming to language standards? On Fri, Nov 01, 2013 at 07:32:44PM +0200, Jukka K. Korpela wrote: > 2013-11-01 17:37, Jennifer Wong wrote: > > >I would like to ask for advice on removing accents from characters. > > To address first the question you ask in the Subject line, “How to > remove accents while conforming to language standards?”, but do not > ask in the message body, the answer is: You can’t. Of course, he can. He even provided an algorithm to do it. (And to address “it is as acceptable as stripping the vowels from English”, stripping vowels from English CAN be done, and it MUST be done if the context requires it.) This mailing list bursts with reasonable insightful people. This question comes again and again; how comes that it is ALWAYS that the same answer pops out, the answer which is meaningless, not helpful, and, MOREOVER, wrong? I suspect that what the participants wanted to write was that such processes are usually LOSSY, not that they CANNOT be done. Given that the initial question was more or less explicitly formulated as “how to minimize the losses?”, I think that what is happening in this thread is even less forgivable than the other times this was happening here… When one MUST convert into an accent-less form [for human consumption] (the situation which, being in US, I find myself frequently in), SOME losses are usually tolerable. One approach (which is very often applicable) is “lossy; so what?”; just strip away, and be happy. If minimization of losses is important, this question was also answered on this list. Checking “my database of useful answers” http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Useful_tidbits_from_Unicode_mailing_list_%28unsorted%29 I see: Transliteration on passports (see p.IV-48) http://www.icao.int/publications/Documents/9303_p1_v1_cons_en.pdf [BTW, the URL for the database contains a misprint; nowadays, most of the entries are sorted into categories. “This one”, though, is not sorted.] Hope this helps, Ilya

