Markus, With "maxCollationTries=0", it is not going out and querying the collations to see how many hits they each produce. So it doesn't know the # of hits. That is why if you also specify "collateExtendedResults=true", all the hit counts are zero. It would probably be better in this case if it would not report "hits" in the extended response at all. (On the other hand, if you're seeing zeros and "maxCollationTries>0", then you've hit a bug!)
"thresholdTokenFrequency" in my opinion is a pretty blunt instrument for getting rid of bad suggestions. It takes out all of the rare terms, presuming that if a term is rare in the data it either is a mistake or isn't worthy to be suggested ever. But if you're using "maxCollationTries" the suggestions that don't fit will be filtered out automatically, making "thresholdTokenFrequency" to be needed less. (On the other hand, if you're using IndexBasedSpellChecker, "thresholdTokenFrequency" will make the dictionary smaller and "spellcheck.build" run faster... This is solved entirely in 4.0 with DirectSolrSpellChecker...) For the apps here, I've been using "maxCollationTries=10" and have been getting good results. Keep in mind that even though you're allowing it to try up to 10 queries to find a viable collation, so long as you're setting "maxCollations" to something low it will (hopefully) seldom need to try more than a couple before finding one with hits. (I always ask for only 1 collation as we just re-apply the spelling correction automatically if the original query returned nothing). Also, if "spellcheck.count" is low it might not have enough terms available to try, so you might need to raise this value also if raising "maxCollationTries". The worse problem, in my opinion is the fact that it won't ever suggest words if they're in the index (even if using "thresholdTokenFrequency" to remove them from the dictionary). For that there is https://issues.apache.org/jira/browse/SOLR-2585 which is part of Solr 4. The only other workaround is "onlyMorePopular" which has its own issues. (see http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, June 06, 2012 5:22 AM To: solr-user@lucene.apache.org Subject: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thresholdTokenFrequency to make it appear in the list of collations. However, with collateExtendedResults=true the hits field for each collation was zero, which is incorrect. Required collation=huub stapel (two hits) and q=huup stapel "collation":{ "collationQuery":"heup stapel", "hits":0, "misspellingsAndCorrections":{ "huup":"heup"}}, "collation":{ "collationQuery":"hugo stapel", "hits":0, "misspellingsAndCorrections":{ "huup":"hugo"}}, "collation":{ "collationQuery":"hulp stapel", "hits":0, "misspellingsAndCorrections":{ "huup":"hulp"}}, "collation":{ "collationQuery":"hup stapel", "hits":0, "misspellingsAndCorrections":{ "huup":"hup"}}, "collation":{ "collationQuery":"huub stapel", "hits":0, "misspellingsAndCorrections":{ "huup":"huub"}}, "collation":{ "collationQuery":"huur stapel", "hits":0, "misspellingsAndCorrections":{ "huup":"huur"}}}}} Now, with maxCollationTries set to 3 or higher we finally get the required collation and the only collation able to return results. How can we determine the best value for maxCollationTries regarding the decrease of the thresholdTokenFrequency? Why is hits always zero? This is with a today's build and distributed search enabled. Thanks, Markus