On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote:
Hello,
I'm trying to learn how to use the spell checkers of solr (1.3). I
found out that FileBasedSpellChecker and IndexBasedSpellChecker
produce different outputs.
IndexBasedSpellChecker says
<lst name="spellcheck">
<lst name="suggestions">
<lst name="gane">
<int name="numFound">1</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
<lst name="suggestion">
<int name="frequency">85</int>
<str name="word">game</str>
</lst>
</lst>
<bool name="correctlySpelled">false</bool>
</lst>
</lst>
whereas FileBasedSpellChecker returns
<lst name="spellcheck">
<lst name="suggestions">
<lst name="gane">
<int name="numFound">1</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<arr name="suggestion">
<str>game</str>
</arr>
</lst>
</lst>
</lst>
The differences are the usage of <lst> respectively <arr> for markup
of the suggestions, missing frequences and missing
"correctlySpelled" in FileBasedSpellChecker. Is that a bug or a
feature? Or are there simply no universal rules for the format of
the ouput? The differences make parsing more difficult if you use
IndexBasedSpellChecker and FileBasedSpellChecker.
Are you sending in the same query to both? Frequency and word only
get printed when extendedResults == true. correctlySpelled only gets
printed when there is Index frequency information. For the
FileBasedSpellChecker, there is no Frequency information, so it isn't
returned.
The logic for constructing this is all handled in the
SpellCheckComponent.toNamedList() method and is completely separated
from the individual SpellChecker implementations.
HTH,
Grant
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ