Great input!
Placement of accents with composed are determined by the font editor and are placed correctly. Rendering of decomposed is an approximation. So, yes, composed look better. If the end user has a font that completely supports composed.
Font selection is critical with composed. Many fonts support unaccented Greek letters, so when decomposed is rendered with a font that does not do accents, the accents are represented as boxes (or perhaps something else). This is really ugly, but it is possible to ignore the "static" and read the unaccented letters. But with the same font, composed is entirely un-readable.
End users will need help making good selections. Currently, in BibleDesktop, we provide a Font Picker that gives an alphebetized list of each font on the system and a bunch of sizes that the user might like to use. This does not provide the user any help in picking out a good font.
With regard to searching, it is important that the search is normalized the same way that the text is normalized. One of the challenges with decomposed is that when two accents are provided, they can be in more than one order. So that is why one normalizes decomposed by re-composing and then decomposing. Also, I have heard that for some accented letters there are more than one composed form. This is why one often normalizes composed by decomposing and then recomposing.
Lucene provides for the ability to have multiple indexes per word (think of these as columns in a database table). One can index the unaccented form, the composed form, a transliterated form, ...
ICU provides the ability for front-ends to be independent of the implementation decisions of the Sword API. The frontend can always compose what it is handed before it is rendered.
What is important is for Sword to normalize non-latin1 text as modules and indexes are built. This should be published so that front-end developers can code accordingly.
Eeli Kaikkonen wrote:
This Accented Greek NT thing is great, and I'd like to share some thoughts about it. I'm not a specialist in fonts, Greek or unicode, so be warned. I hope this gives some new and useful thoughts if not else.
First, about de/precomposed characters. If the text uses decomposed characters, the font renderer has to compose them. A Bible software cannot make any difference, it might be good looking or bad looking depending on the renderer. If the text uses precomposed characters, the renderer renders the glyphs straight from the font file, and it's up to the font author to make a good looking glyph. Of course also when using precomposed characters the font file may have bugs which the renderer cannot fix, and renderer may not handle correctly the situations where a glyph in the font file is actually a link to some other glyphs (this is not the same thing as de/precomposing!).
So there are many renderers, many fonts and many combinations, and any of them may have bugs. Using decomposed characters adds more chances of having bugs. Therefore I would prefer the precomposed form.
I did not know there are free (like in thought) unicode fonts covering Greek Extended before this thread. I like FreeFont, though it has some bugs. I think I found the reason for those bugs, now I have to report them.
If there are reasons for the Sword library being Free Software, there are also reasons for fonts being Free or Open. In my humble opinion we as individuals and the Sword project as a whole could support FreeFont project in some way or another. Finding and reporting bugs is one way. It would be quite short-sighted to choose a good looking but non-free font for use with Sword.
Fonts are of course not the problem of library, but of the frontends. However, I think there are many developers here who are working for the frontends. A Bible software could even include the font files, and that would help the users because they would not have to find a proper font from their system. At least the software developers could add pointers to the Free font files into documentation.
Here is more information about FreeFont: http://www.nongnu.org/freefont/ http://savannah.nongnu.org/projects/freefont/
I have put some screenshots in my www pages. I think they show quite clearly that precomposed is better than decomposed. I copied the text shown in
http://crosswire.org/study/parallelstudy.jsp?add=WHNU&add=WHAC&add=WHACD.
Unfortunately I did not get that page (the fonts) working with Konqueror or Firefox. The CSS is too complicated to edit by hand and makes the worst possible mistake usability wise: it overwrites the settings which the user has got right before. I could use FreeFont, Gentium or some other and I think that the browsers could handle them. But the CSS gives other font names and either I don't have them or they don't include Greek Extended properly.
Anyways, I copied some verses to KWord and OpenOffice (I use Debian GNU/Linux). They render the fonts differently. Both render the precomposed characters well. Both have problems with decomposed characters. Look at verse 1, Iakobos and diaspora, and verse 4, ina eete. I used two fonts, Gentium and FreeSerif. FreeSerif looks better. Additionally FreeFont has also sans serif and monospace fonts, and sans serif looks even better or is easier to read with small sizes.
Here are the screenshots, they are large pictures: http://iki.fi/eelik/kwordjacobgreek.png http://iki.fi/eelik/oojacobgreek.png
Then, about searching. If you want to do the search using accents you have to know exactly what you want. Remember that accents may depend on other words than which you are searcing for. Also if you don't know Greek very well it might be hard to remember the accents even though you remember some word. Only rarely someone wants to really search for accents. Mostly those who use Sword want to do biblical interpretation, not linguistic research. Therefore I think that accents should in some way or another be excluded from searching.
For canonical New Testament the best solution might be using search with Strong's numbers or some equivalent. There already are modules with Strong's numbers and morphological tags and the new modules also have at least the morphological tags. Those tags give the possibility to search by any form of the word, and accents may be ignored. Doing syntactical analysis becomes possible too, and it is not a small advantage.
It is up to a frontend software to make this kind of search usable. For the Sword library it would be enough to offer the search for text letter by letter, and search for numbers/tags.
If someone wants to have search with Greek words and accents, precomposed form would be better. I think it is faster to do a search with precomposed characters because there is less to compare. Only if someone wants to search for a word where e.g. "the last alpha may have grave OR acute" the decomposed form would be better. And actually even then the frontend could alter the search string by normalizing and making the proper OR statements.
Troy wrote that we could "b)NFC both the search string and the text before searching". But why NFC the text before searching? The text should be normalized to NFC or NFD already, there is no reason to offer a non-normalized module. The search string can be normalized to any known form, whether it be NFC or NFD, if the form of the text module is known. (Normalization forms are quite hard to understand reading the Unicode documentation, I suppose NFC means the most precomposed form and NFD the most decomposed.)
The bottom line is this:
1. Precomposed is better. I don't see any reason to use decomposed text in modules.
2. It would be good to support Free or Open fonts in some way or another.
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page