Problems with the spellchecker

Stéphane Tellier Mon, 16 Mar 2009 08:15:01 -0700
Hi,
 
    We think it may have a bug with the spellchecker. It is about accents
and ISO-latin special characters. If I'm doing a request like this (about
the word "considération") :
 
http://localhost/solr/spellCheckCompRH?q=considération&spellcheck=on&spellcheck.dictionary=file
 
and if I have a good amounts of words in my dictionary, it will return
suggestions for "consid" and "ration". It look likes it's considering the
"é" character as a space or a separator.
 
Having looked through the code, I have found the class
SpellingQueryConverter which seems to do the work. I think that the problem
is the regular expression : the predefined character class \w might not work
for special characters. As defined by the Java API, \w = [a-zA-Z_0-9], which
could not necessarily include ISO accent characters. I didn't found a
regular expression that would be able to work all this out, but I think that
it would be important to fix that for the next version.
 
The version we're working with is the Nightly Build of 2009-03-04 (because
we need the better tuned-up facet module, which is quite faster).
 
Thanks.
-- 
View this message in context: 
http://www.nabble.com/Problems-with-the-spellchecker-tp22540347p22540347.html
Sent from the Solr - User mailing list archive at Nabble.com.
Problems with the spellchecker

Reply via email to