DM and I thought about this a while back wrt some problems we had with Farsi - essentially there are three scenarios for each diacritic sign - not there, integrated or extra. Modules usually are a mixture of integrated use of diacritics and extra, more or less pure one or the other.
Search entries depend heavily on the keyboard available - a German searching on a German keyboard will use umlauts, a German searching on a British keyboard will use ae, ue or oe, someone else searching a German text might well search simply for a, e or u. So the best way forward appeared at the time to normalise both text and search entry and accept the possibility of extraneous results - particularly around latinate scripts. Alternatively - and I think there is a lot of mileage in there - we should/could demand that modules are designed cleanly in terms of diacritics (i.e. only sequential) and rectified whereever there is a problem. Subsequently only the search entries would need to be normalised or even better could be subject to user settings Peter -------- Original-Nachricht -------- > Datum: Sat, 13 Sep 2008 08:43:08 +0100 > Von: "Troy A. Griffitts" <[EMAIL PROTECTED]> > An: SWORD Support Volunteers <[EMAIL PROTECTED]>, [EMAIL PROTECTED], SWORD > Developers\' Collaboration Forum <[email protected]> > Betreff: Re: [sword-support] Locales > I would guess if we build lucene indexes for that Bible, the lucene > would search ignoring accents? > > Or that module is not UTF-8? > > We have filters that we use on ancient Greek texts that allow searching > regarless of diacritics. He could add a set for any language, but I'm > not sure if this is the right location to place responsibility. Maybe > if it was an ICU filter that could work for any language-- like if it's > just a normalization problem. We could use that one filter for all > Bibles like we do the filter for Greek. > > Not sure, just thinking out loud. > > -Troy. > > > > > Peter von Kaehne wrote: > > Thanks. this is a known problem which caases a lot of difficulties - in > all languages which rely on diacritics. > > > > There is a plan to improve the search facility. > > > > Peter > > > > -------- Original-Nachricht -------- > >> Datum: Fri, 12 Sep 2008 19:57:58 +0200 (CEST) > >> An: [EMAIL PROTECTED] > >> Betreff: [sword-support] Locales > > > >> Peace and love to my brothers and sisters in Jesus Christ, our Lord, > from > >> Jan, His weak servant. > >> > >> I am sorry to inform you about an error in the search engine of The > Bible > >> Tool. While using Czech the search does not correctly interprets all > the > >> letters with diacritic, e.g. > >> > >> while typing the request: > >> > >> Nesl svůj kříž > >> > >> > http://www.crosswire.org/study/wordsearchresults.jsp?searchTerm=Nesl+sv%C5%AFj+k%C5%99%C3%AD%C5%BE > >> > >> the result says that there is > >>> 0 result in the text of Czech Ekumenicky Cesky preklad< > >> even the searched text was copied & pasted directly from it. > >> > >> I hope, it neads only the minor repair only, while the search gives > good > >> results while looking for the phrases w/o Czech specific letters > >> > >> Wish: the search default is "exact match" hence: > >>> Co jsem napsal, napsal< gives result > >> but > >>> co jsem napsal, napsal< gives 0 result > >> As people use the search to help their poor memory, I wish to realy > help > >> them with less "censorious" matching criteria. These can be useful in > the > >> "Advanced search". > >> > >> God helps to your "Opus Dei" > >> > >> > >> _______________________________________________ > >> sword-support mailing list > >> [EMAIL PROTECTED] > > -- GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
