> Looks interesting. How are you approaching the complication that transliteration is
>between pairs of languages?
I know what you mean: Gorbachev is Gorbatschow in German.
I think that the rules that we have in ICU are probably English-centric where it makes
a difference.
Note that some of the transliterator functions like uppercasing and any-name are just
wrappers around Unicode functions, and so not language-dependent.
The strength of the API is that you can roll your own rules at runtime and at
compile-time. If you have different rules for Finnish as a target language for
transliteration, then you can modify the ICU rules or supply a whole different set for
your own.
The rules are written somewhat similarly to regular expressions.
See the (draft, somewhat outdated) user guide chapter:
http://oss.software.ibm.com/icu/userguide/Transliteration.html
and the API references:
http://oss.software.ibm.com/icu/apiref/class_Transliterator.html and
http://oss.software.ibm.com/icu/apiref/utrans_h.html
markus