See https://issues.apache.org/jira/browse/LUCENE-1417 and
http://lucene.markmail.org/message/sktohlgqxcpmpf7z?q=list:org%2Eapache%2Elucene%2Esolr-user+spellchecker+Rennie
In short, frequency is the second order sort level. I think it should
be made pluggable. A patch would be most welcome. I don't have
time to produce one at the moment, but can shepherd it through.
FWIW, you might also try the Jaro-Winkler (JW) distance as the
default. Edit distance is not as good, since it treats differences
the same no matter where in the word they occur, whereas most people
tend to make spelling mistakes later on in a word, which I believe JW
takes into account when scoring.
On Nov 11, 2008, at 11:52 AM, Jeff Newburn wrote:
Ok. I have managed to get the search component added (You rock
Grant). I
am having some interesting issues now with the suggestions. We sell
shoes
online so I am trying to get it to spellcheck for brand name.
When I search konverse with spelling on it returns converse correctly
however when I search nice (instead of nike) I am returned all sorts
of
results not sorted by frequency. I have even turned on
onlyMorePopular but
it still is returning all of the different words in no order. Nike
is by
far the most frequent term how do I get it to the top?
I am currently using the svn build of solr1.4. I have included the
configuration as well as the resultset return for spelling
suggestions.
Below is the configuration:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<!--<str name="queryAnalyzerFieldType">textSpell</str>-->
<str name="buildOnCommit">true</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<str name="field">word</str>
<str name="spellcheckIndexDir">./spellchecker1</str>
<str name="accuracy">0.5</str>
</lst>
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="field">word</str>
<!-- Use a different Distance Measure -->
<str
name
=
"distanceMeasure
">org.apache.lucene.search.spell.JaroWinklerDistance</s
tr>
<str name="spellcheckIndexDir">./spellchecker2</str>
</lst>
<lst name="spellchecker">
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="name">file</str>
<str name="sourceLocation">spellings.txt</str>
<str name="characterEncoding">UTF-8</str>
<str name="indexDir">./spellcheckerFile</str>
</lst>
</searchComponent>
Return results:
<lst name="spellcheck">
?
<lst name="suggestions">
?
<lst name="nice">
<int name="numFound">20</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
?
<lst name="suggestion">
<int name="frequency">47</int>
<str name="word">Mice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">26</int>
<str name="word">Vice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">14</int>
<str name="word">Nice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">4</int>
<str name="word">Bice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">1</int>
<str name="word">Dice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">4099</int>
<str name="word">Nike</str>
</lst>
On 11/11/08 4:39 AM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
Hi Jeff,
A SearchComponent allows you to connect functionality with any
Request
Handler, allowing you to inline spelling requests (or other things
like MoreLikeThis) with your queries, saving you from having to make
an extra request.
I walk through a lot of this in my article on Solr 1.3 for IBM
devWorks:
http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&
S_CMP=HP
You can also refer to the Wiki at:
http://wiki.apache.org/solr/SearchComponent
and specifically:
http://wiki.apache.org/solr/SpellCheckComponent
It works independently from the query parser (i.e. dismax).
-Grant
On Nov 10, 2008, at 7:00 PM, Jeff Newburn wrote:
I am still relatively new to solr. I have gotten the
spellcheckerrequesthandler working the way I would like. Now I am
diving
into the search component version of the spell checker. I was
hoping
someone could help explain 1. What specifically does the
searchcomponent
offer and how would I go about putting it into all search terms with
the
dismax type.
-Jeff
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ