What's the saying?  "It's not a bug, it's a feature!"

The QueryConverter is by definition a simple implementation that handles the basics and is designed to be replaced by those with specific needs. http://wiki.apache.org/solr/SpellCheckComponent#head-8a3a4d45708be416cec61a9387131cd52fcdbbbf

It would probably be good to at least have a few different implementations for handling various common scenarios.

-Grant


On Mar 16, 2009, at 11:14 AM, Stéphane Tellier wrote:


Hi,

We think it may have a bug with the spellchecker. It is about accents and ISO-latin special characters. If I'm doing a request like this (about
the word "considération") :

http://localhost/solr/spellCheckCompRH?q=considération&spellcheck=on&spellcheck.dictionary=file

and if I have a good amounts of words in my dictionary, it will return
suggestions for "consid" and "ration". It look likes it's considering the
"é" character as a space or a separator.

Having looked through the code, I have found the class
SpellingQueryConverter which seems to do the work. I think that the problem is the regular expression : the predefined character class \w might not work for special characters. As defined by the Java API, \w = [a-zA- Z_0-9], which
could not necessarily include ISO accent characters. I didn't found a
regular expression that would be able to work all this out, but I think that
it would be important to fix that for the next version.

The version we're working with is the Nightly Build of 2009-03-04 (because
we need the better tuned-up facet module, which is quite faster).

Thanks.
--
View this message in context: 
http://www.nabble.com/Problems-with-the-spellchecker-tp22540347p22540347.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to