There are similar issues when it comes to searching the Burmese Judson module (Myanmar script is very complex).
KS, (a contact with a close interest in Myanmar languages) recently wrote this in an email to me. "The normal Sword library has the option to use Lucene for searching. However, the StandardAnalyser assumes that words are space based and seems to ignore Unicode marks. This results in very bad search results for any language based on the Myanmar script. I've therefore downloaded the CLucene library 0.9.23 from git and patched it to call a Myanmar specific tokenizer if the LanguageBasedAnalyzer is used. The LanguageBasedAnalyzer defaults to the StandardTokenizer if no language specific tokenizer is found. Once I've tested this some more, I hope to submit it to the CLucene project and see if they will incorporate it. Would Crosswire be interested in accepting patches to use CLucene 0.9.23 (rather than 0.9.21 as seems to be the case at present) and the LanguageBasedAnalyzer?" David -- View this message in context: http://n4.nabble.com/Search-in-Chinese-modules-tp1476753p1477236.html Sent from the SWORD Dev mailing list archive at Nabble.com. _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
