[ 
https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793158#action_12793158
 ] 

Shalin Shekhar Mangar commented on SOLR-1676:
---------------------------------------------

Although it is not documented anywhere, SpellCheckComponent passes 
max(spellcheck.count, 5) to the Lucene spellchecker, see 
AbstractLuceneSpellChecker line 141 in trunk.

bq. The effect is that with a low value for spellcheck.count you might miss 
good hits. In other words, the first item with spellcheck.count==1 is not 
always the same item as with e.g. spellcheck.count==10. 

That is true. It is a trade-off between accuracy and performance. We cannot 
avoid this without fetching all results (or a large number of them) internally 
and score all of them with a distance metric and that can make it very slow.

Do you have any suggestion on how we could improve the documentation?



> spellcheck.count has confusing default and documentation
> --------------------------------------------------------
>
>                 Key: SOLR-1676
>                 URL: https://issues.apache.org/jira/browse/SOLR-1676
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 1.4
>            Reporter: Daniel Naber
>            Priority: Minor
>
> It seems spellcheck.count does not just limit the number of results returned, 
> as the documentation claims. Instead, this value is given to the Lucene 
> SpellChecker class which multiplies it by 10 and then only fetches the first 
> spellcheck.count*10 candidates, ignoring all others. The effect is that with 
> a low value for spellcheck.count you might miss good hits. In other words, 
> the first item with spellcheck.count==1 is not always the same item as with 
> e.g. spellcheck.count==10.
> The fix could be to fix the documentation (the comments in the sample 
> solrconfig.xml) to mention this and use a better default.
> The Lucene SpellChecker class says about the numSug parameter: "Thus, you 
> should set this value to *at least* 5 for a good suggestion."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to