Re: another spellchecker question

2008-04-23 Thread Shalin Shekhar Mangar
Hi Geoffrey,
Yes, this is a caveat in the lucene contrib spellchecker which Solr uses.
From the lucene spell checker javadocs:

* pAs the Lucene similarity that is used to fetch the most relevant
n-grammed terms
   * is not the same as the edit distance strategy used to calculate the
best
   * matching spell-checked word from the hits that Lucene found, one
usually has
   * to retrieve a couple of numSug's in order to get the true best match.
   *
   * pI.e. if numSug == 1, don't count on that suggestion being the best
one.
   * Thus, you should set this value to bat least/b 5 for a good
suggestion.

Therefore what you're seeing is by design. Probably we should change the
default number of suggestions when querying lucene spellchecker to 5 and
give back the top result if the user asks for only one suggestion from solr.

On Wed, Apr 23, 2008 at 5:58 PM, Geoffrey Young [EMAIL PROTECTED]
wrote:

 hi :)

 I've noticed that (with solr 1.2) the returned order (as well as the
 actual matched set) is affected by the number of matches you ask for:

  q=hannasuggestionCount=1
suggestions:[Yanna]

  q=hannasuggestionCount=2
suggestions:[Manna,
  Yanna]

  q=hannasuggestionCount=5
suggestions:[Manna,
  Nanna,
  Sanna,
  Vanna,
  Shanna]

 note how the #1 result is completely missing from the top 5... or at
 least that's how I _used_ to think about the sets :)

 unfortunately, extendedresults seems to be a 1.3-only option, so I can't
 see what's going on here.  but I guess I'm asking if this is expected
 behavior.

 --Geoff




-- 
Regards,
Shalin Shekhar Mangar.


Re: another spellchecker question

2008-04-23 Thread Geoffrey Young



Shalin Shekhar Mangar wrote:

Hi Geoffrey,
Yes, this is a caveat in the lucene contrib spellchecker which Solr uses.

From the lucene spell checker javadocs:


* pAs the Lucene similarity that is used to fetch the most relevant
n-grammed terms
   * is not the same as the edit distance strategy used to calculate the
best
   * matching spell-checked word from the hits that Lucene found, one
usually has
   * to retrieve a couple of numSug's in order to get the true best match.
   *
   * pI.e. if numSug == 1, don't count on that suggestion being the best
one.
   * Thus, you should set this value to bat least/b 5 for a good
suggestion.

Therefore what you're seeing is by design. Probably we should change the
default number of suggestions when querying lucene spellchecker to 5 and
give back the top result if the user asks for only one suggestion from solr.


great, thanks for all that - I'm still trying to figure out where all 
the relevant docs live.  you've been really helpful.


--Geoff