> Looks interesting.  How are you approaching the complication that transliteration is 
>between pairs of languages?

I know what you mean: Gorbachev is Gorbatschow in German.

I think that the rules that we have in ICU are probably English-centric where it makes 
a difference.
Note that some of the transliterator functions like uppercasing and any-name are just 
wrappers around Unicode functions, and so not language-dependent.

The strength of the API is that you can roll your own rules at runtime and at 
compile-time. If you have different rules for Finnish as a target language for 
transliteration, then you can modify the ICU rules or supply a whole different set for 
your own.
The rules are written somewhat similarly to regular expressions.

See the (draft, somewhat outdated) user guide chapter: 
http://oss.software.ibm.com/icu/userguide/Transliteration.html
and the API references: 
http://oss.software.ibm.com/icu/apiref/class_Transliterator.html and 
http://oss.software.ibm.com/icu/apiref/utrans_h.html

markus

Reply via email to