Re: Problems with the spellchecker

Grant Ingersoll Mon, 16 Mar 2009 13:26:00 -0700

What's the saying?  "It's not a bug, it's a feature!"

The QueryConverter is by definition a simple implementation thathandles the basics and is designed to be replaced by those withspecific needs. http://wiki.apache.org/solr/SpellCheckComponent#head-8a3a4d45708be416cec61a9387131cd52fcdbbbf

It would probably be good to at least have a few differentimplementations for handling various common scenarios.


-Grant


On Mar 16, 2009, at 11:14 AM, Stéphane Tellier wrote:

Hi,
We think it may have a bug with the spellchecker. It is aboutaccentsand ISO-latin special characters. If I'm doing a request like this(about
the word "considération") :

http://localhost/solr/spellCheckCompRH?q=considération&spellcheck=on&spellcheck.dictionary=file

and if I have a good amounts of words in my dictionary, it will return
suggestions for "consid" and "ration". It look likes it'sconsidering the
"é" character as a space or a separator.

Having looked through the code, I have found the class
SpellingQueryConverter which seems to do the work. I think that theproblemis the regular expression : the predefined character class \w mightnot workfor special characters. As defined by the Java API, \w = [a-zA-Z_0-9], which
could not necessarily include ISO accent characters. I didn't found a
regular expression that would be able to work all this out, but Ithink that
it would be important to fix that for the next version.
The version we're working with is the Nightly Build of 2009-03-04(because
we need the better tuned-up facet module, which is quite faster).

Thanks.
--
View this message in context: 
http://www.nabble.com/Problems-with-the-spellchecker-tp22540347p22540347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems with the spellchecker

Reply via email to