[Solr Wiki] Update of "SpellCheckingAnalysis" by GrantIngersoll

Apache Wiki Thu, 05 Feb 2009 11:51:10 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/SpellCheckingAnalysis

New page:
= Introduction =

Analysis is a very important factor in spell checking.  Stemming and other 
techniques that change tokens is not recommended since it will result in giving 
stems as suggestions.  Instead, you should use a very minimal 
tokenization/analysis process like the !StandardAnalyzer or even the 
!WhitespaceTokenizer plus a simple lower casing filter and a filter that 
removes apostrophes and the like.  As with most things in search, there are 
always tradeoffs and you should evaluate the results in your application.

That being said, a common configuration for spell checking is:

{{{
<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>
}}}

Use a <copyField> to divert your main text fields to the spell field and then 
configure your spell checker to use the "spell" field to derive the spelling 
index.

[Solr Wiki] Update of "SpellCheckingAnalysis" by GrantIngersoll

Reply via email to