Chris Little wrote: > MorphGNT and an updated Tisch, both from morphgnt.org are up in the beta > area. > Both of these modules use composed UTF-8 characters.
In April 2005 we had a discussion on whether Greek should be composed or decomposed. I don't remember coming to a resolution. Are we going with composed? To summarize, some frontends (including different browers viewing the Bible Tool) handled composed better than decomposed. Others did the opposite. Font choice had significant impact on the results. It was noted that we could have filters for composition or decomposition to transform as the frontend needed. If we allow for modules to vary with regard to this, could/should we have an entry in the conf indicating the normalization? Perhaps with the values from NFC, NFD, NFKD, NFKC, FCD? Should osis2mod do normalization to an agreed upon normalization? How should a Greek (or any other accented text) be indexed with Lucene. Should we index various representations: Fully (de)composed, un-accented, transliterated? It seems that the frontend needs to know how the index is represented so that it can appropriately normalize user input. Right now Lucene indexes what it is handed and the user is responsible for matching that. In Him, DM _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page