Hi Chris, > In terms of combining characters vs. precomposed, all you really need to > do is to remember to use a single normalization form. Unicode sort of > informally suggests that NFC is best. W3C specifically recommends using > NFC (see http://www.w3.org/TR/charmod-norm/). Roughly, NFC > normalization consists of taking a string, decomposing all characters, > then combining any codepoints that can be combined, provided the > precombined codepoints are not compatability codepoints. The way to > ensure that a string is NFC normalized is to just normalize it with > something like the uconv program I mentioned. > > I really don't know whether Extended Greek is NFC or not. So the last > step before creating the Sword module should be normalization.
Actually, the NFC standard is all about precomposed chars. All the extended greek chars are exactly this: the (pre-composed) greek letters with the diacriticals. I use icu4j for all my tests & conversions and when asking to take a text and convert it to NFC it does use the extended greek chars. So, my almost certain answer, is yes (extended greek is NFC) Actually, the problem that most greek accented texts have is that they use some diacriticals that they are not combining-diacriticals. The visual result may be the same, but when trying to convert to NFC they are left as they are. But this is wrong because there are precomposed characters that would nicely replace these. The issue is that the unicode set provides many ways for greek text to 'look' the same. This is what I am trying to correct to some texts (including the WH) which tends to use (at some points) diacriticals that are not combining! I think this is the result of scanning. What I do (and I think is correct, any thoughts here?) is take the greek text, decompose it (icu4j->NFD), replace all non-combining diacriticals with combining ones (and change their order so they can be normalized correctly) and NFC it again. The result should be a text with ONLY extended greek characters (and NO stand-alone diacriticals AT ALL). After doing this the NFC->NFD->NFC gives back the same text. Any comments/corrections on the above is highly welcome, In Christ, Costas _______________________________________________ sword-devel mailing list [EMAIL PROTECTED] http://www.crosswire.org/mailman/listinfo/sword-devel
