Re: Differences in output of spell checkers

Grant Ingersoll Wed, 04 Feb 2009 12:05:44 -0800


On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote:

Hello,
I'm trying to learn how to use the spell checkers of solr (1.3). Ifound out that FileBasedSpellChecker and IndexBasedSpellCheckerproduce different outputs.
IndexBasedSpellChecker says

<lst name="spellcheck">
        <lst name="suggestions">
                <lst name="gane">
                        <int name="numFound">1</int>
                        <int name="startOffset">0</int>
                        <int name="endOffset">4</int>
                        <int name="origFreq">0</int>
                        <lst name="suggestion">
                                <int name="frequency">85</int>
                                <str name="word">game</str>
                        </lst>
                </lst>
                <bool name="correctlySpelled">false</bool>
        </lst>
</lst>

whereas FileBasedSpellChecker returns

<lst name="spellcheck">
        <lst name="suggestions">
                <lst name="gane">
                        <int name="numFound">1</int>
                        <int name="startOffset">0</int>
                        <int name="endOffset">4</int>
                        <arr name="suggestion">
                                <str>game</str>
                        </arr>
                </lst>
        </lst>
</lst>
The differences are the usage of <lst> respectively <arr> for markupof the suggestions, missing frequences and missing"correctlySpelled" in FileBasedSpellChecker. Is that a bug or afeature? Or are there simply no universal rules for the format ofthe ouput? The differences make parsing more difficult if you useIndexBasedSpellChecker and FileBasedSpellChecker.

Are you sending in the same query to both? Frequency and word onlyget printed when extendedResults == true. correctlySpelled only getsprinted when there is Index frequency information. For theFileBasedSpellChecker, there is no Frequency information, so it isn'treturned.

The logic for constructing this is all handled in theSpellCheckComponent.toNamedList() method and is completely separatedfrom the individual SpellChecker implementations.


HTH,
Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Differences in output of spell checkers

Reply via email to