When Bob suggested to me that we use ICU in Sword, my reaction was that it was just 
too big and didn't offer enough to us to make it work adding.  I think it deserves 
some further consideration though, and that we should consider adding it in 1.7 if not 
1.5.3.

The encoding conversions aren't that important to me except for downgrading to Latin-1 
because I believe we should still keep the modules in UTF-8 (and eventually convert 
those that still remain in other encodings to UTF-8).  But ICU has a lot of other 
things to offer us, the coolest of which (IMO) are locale information and 
transliteration.

You can see some of the locale info that ICU contains in its data files at 
http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer/.

But definitely check out the demo of their transliterator at 
http://oss.software.ibm.com/developerworks/opensource/icu/translitdemo.  It needs some 
work still because it doesn't look like it handles pre-composed or combining 
characters.

Examples are the transliteration of the RST Genesis 1:1 from "В начале 
сотворил Бог небо и землю" to "V nachalè sotvorìl Bog nèbo ì 
zèmlù" or LXX Genesis 1:1 from "εν αρχη εποιησεν ο θεος τον 
ουρανον και την γην " to "en archē epoiēsen o theos ton ouranon kai 
tēn gēn".

It claims to do a lot of non-roman scripts like Katakana also, so we might consider 
transliteration to Latin-1 as means for supporting front-ends that can't support UTF-8.

--Chris

Reply via email to