Hi, all.
Sorry for any duplication - seems like what I sent yesterday never made it
through...
We're having some troubles with the Solr Spellcheck Response. We're running
version 3.1.
Overview: If we search for something really ugly like:
"kljhklsdjahfkljsdhf book rck"
then when we get back the response, there's a suggestions list for 'rck', but
no suggestions list for the other two words. For 'book', that's fine, because
it is 'spelled correctly' (i.e. we got hits on the word) and there shouldn't be
any suggestions. For the ugly thing, though, there aren't any hits.
The problem is that when we're handling the result, we can't tell the
difference between no suggestions for a 'correctly spelled' term, and no
suggestions for something that's odd like this.
(Now - this is happening with searches that aren't as obviously garbage - i.e.
words that are real words, just that just don't show up in the index and have
no suggestions - this was just to illustrate the point).
Our setup:
We're running multiple shards, which may be part of the issue. For example,
'book' might be found in one of the shards, but not another.
I don't *think* this has anything to do with our schema, since it's really how
the Search Suggestions are being returned to us. But, here are some bits and
pieces:
>From schema.xml:
>From solrconfig.xml:
textSpell
default
textSpell
./spellchecker
What we'd really like to see is the response coming back with an indication
that a word wasn't found / had no suggestions. We've hacked around in the code
a little bit to do this, but were wondering if anyone has come across this, and
what approaches you've taken.
We created new classes which extend IndexBasedSpellChecker and
SpellCheckComponent, as follows (package and imports excluded for (sort of)
brevity). The methods are as taken from the overridden classes, with changes
noted by "SD" type comments...
/**
* This has a slight modification of Solr's
AbstractLuceneSpellChecker.getSuggestions(SpellingOptions).
* The modification allows correctly spelled words to be returned in the
suggestion. This modification working in tandem
* with the SirsiDynixSpellCheckComponent allows words with no suggestions to be
returned from the spell check component
* even in a sharded search.
* Changes are marked with SD in the comments.
*/
public class SirsiDynixIndexBasedSpellChecker extends IndexBasedSpellChecker{
@Override
public SpellingResult getSuggestions(SpellingOptions options) throws
IOException {
boolean shardRequest = false;
SolrParams params = options.customParams;
if(params!=null)
{
shardRequest = "true".equals(params.get(ShardParams.IS_SHARD));
}
SpellingResult result = new SpellingResult(options.tokens);
IndexReader reader = determineReader(options.reader);
Term term = field != null ? new Term(field, "") : null;
float theAccuracy = (options.accuracy == Float.MIN_VALUE) ?
spellChecker.getAccuracy() : options.accuracy;
int count = Math.max(options.count,
AbstractLuceneSpellChecker.DEFAULT_SUGGESTION_COUNT);
for (Token token : options.tokens) {
String tokenText = new String(token.buffer(), 0, token.length());
String[] suggestions = spellChecker.suggestSimilar(tokenText,
count,
field != null ? reader : null, //workaround LUCENE-1295
field,
options.onlyMorePopular, theAccuracy);
if (suggestions.length == 1 && suggestions[0].equals(tokenText)) {
//These are spelled the same, continue on
List suggList = Arrays.asList(suggestions); //SD added
result.add(token, suggList);//SD added
continue;
}
if (options.extendedResults == true && reader != null && field != null) {
term = term.createTerm(tokenText);
result.add(token, reader.docFreq(term));
int countLimit = Math.min(options.count, suggestions.length);
if(countLimit>0)
{
for (int i = 0; i < countLimit; i++) {
term = term.createTerm(suggestions[i]);
result.add(token, suggestions[i], reader.docFreq(term));
}
} else if(shardRequest) {
List suggList = Collections.emptyList();
result.add(token, suggList);
}
} else {
if (suggestions.length > 0) {
List suggList = Arrays.asList(suggestions);
if (suggestions.length > options.count) {
suggList = suggList.subList(0, options.count);
}
result.add(token, suggList);
} else if(shardRequest) {
List suggList = Collections.emptyList();
result.add(token, suggList);
}
}
}
return result;
}
}
/**
* This is a