Regarding languages with diacritics, accents, cantillation, etc...

The SWORD M.O. is to have one set of StripFilters that massage both:
o       the body of the text being searched
o       the target search string

so we can get sane results.

With Greek we've been fairly intentional to strip accents and ms markup from both module and search text input for our searching. I would bet we still have some last minute code added somewhere which does special things if we're in a Greek text-- obviously this should be remedied. I doubt we've done the same for Hebrew. e.g., I would bet unaccented Greek searches would work fine in SWORDweb, but consonant-only Hebrew searches would not work. In anycase, the proper way to make things work is to have appropriate StripFilter entries in the wlc.conf, and to be sure Xiphos is calling module.StripText(userInputSearchText) before calling SWORD's search mechanism to be sure we're comparing equivalent texts.

Does this make sense?

        -Troy.





Troy A. Griffitts wrote:
SWORDWeb seems to work fine. I'd appreciate it if we could have construction fact input instead of useless statements like "it's SWORD's fault". Thanks.

http://crosswire.org/study/wordsearchresults.jsp?searchTerm=שָׁמָיִם

Anyone willing to put the time into investigating if proper UTF-8 is being sent into the SWORD engine from the copy and paste from Xiphos?

    -Troy.



Matthew Talbert wrote:
I don't know for sure if this is the same bug, but I know that CLucene
has severe issues (read: complete inability) with Unicode support.  If
you are using a CLucene indexed module, this could definitely be a
contributing factor to the problem.  In BibleTime we don't use SWORD's
search features, we re-implement that ourselves with CLucene, and our
result is a similar problem with Unicode modules that have indecies.

The searches work nearly the same for indexed and non-indexed
searches, so it's SWORD, not clucene. I would be interested in hearing
what Unicode issues clucene has. The only one I recall is the
inability to prefix a search with a wildcard (which is very useful for
languages such as French).

Matthew

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to