solr spell suggestions help

Rohan Thakur Fri, 05 Apr 2013 04:14:58 -0700

hi all

I had some issues with solr spell suggestions.


1) first of all I wanted to know is indexbased spell suggestions better
then directspell suggestions that solr 4.1 provides in any way?

 2) then I wanted to know is their way I can get suggestions for words
providing only few prefix for the word. like when I query sam I should get
samsung as one of suggestion.

3) also I wanted to know why am I not getting suggestions for the words
that have more then 2 character difference between them like if I query for
wirlpool wich has 8 characters I get suggestion as whirlpool which is 9
characters and correct spelling but when I query for wirlpol which is 7
characters it says that this is false spelling but does not show any
suggestions. even like if I search for pansonic(8 char) it provides
panasonic(9 char) as suggestion but when I remove one more character that
is is search for panonic(7 char) it does not return any suggestions?? how
can I correct this? even when I search for ipo it does not return ipod as
suggestions?

4) one more thing I want to get clear that when I search for microwave ovan
it does not give any miss spell even when ovan is wrong it provides the
result for microwave saying the query is correct...this is the case when
one of the term in the query is correct while others are incorrect it does
not point out the wrong spelling one but reutrns the result for correct
word thats it how can I correct this? similar is the case when I query for
microvave oven is shows the result for oven saying that the query is
correct..

5) one more case is when I query plntronies (correct word is: plantronics)
it does not return any solution but when I query for plantronies it returns
the plantronics as suggestions why is that happening?

*my schema.xml is:*
<fieldType name="tSpell" class="solr.TextField" positionIncrementGap="100"
omitNorms="true">
      <analyzer type="index">
          <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="\\\[\]\(\)\-\,\/\+" replacement=" "/>
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LengthFilterFactory" min="2" max="20"/>
          <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
       <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LengthFilterFactory" min="2" max="20"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
          <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
     </fieldType>

<field name="spell" type="tSpell" indexed="true" stored="true" />
<copyField source="title" dest="spell" />



*my solrconfig.xml is :*

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">



    <!-- Multiple "Spell Checkers" can be declared and used by this
         component
      -->

    <!-- a spellchecker built from a field of the main index -->
    <lst name="spellchecker">
    <!--
        Optional, it is required when more than one spellchecker is
configured.
        Select non-default name with spellcheck.dictionary in request
handler.
    -->
      *<str name="name">default</str>*

      <str name="classname">solr.DirectSolrSpellChecker</str>
      <!-- the spellcheck distance measure used, the default is the
internal levenshtein -->
      <!--
        Load tokens from the following field for spell checking,
        analyzer for the field's type as defined in schema.xml are used
    -->
  *    <str name="field">spell</str>
      <str name="distanceMeasure">internal</str>
      <!-- minimum accuracy needed to be considered a valid spellcheck
suggestion -->
      <float name="accuracy">0.3</float>
      <!-- the maximum #edits we consider when enumerating terms: can be 1
or 2 -->
      <int name="maxEdits">1</int>
      <!-- the minimum shared prefix when enumerating terms -->
      <int name="minPrefix">1</int>
      <!-- maximum number of inspections per result. -->
      <int name="maxInspections">5</int>
      <!-- minimum length of a query term to be considered for correction
-->
      <int name="minQueryLength">4</int>
      <!-- maximum threshold of documents a query term can appear to be
considered for correction -->
      <float name="maxQueryFrequency">0.01</float>
      <!-- uncomment this to require suggestions to occur in 1% of the
documents
          <float name="thresholdTokenFrequency">.01</float>
      -->
    </lst>*

    <!-- a spellchecker that can break or combine words.  See "/spell"
handler below for usage -->
    *<lst name="spellchecker">
      <str name="name">wordbreak</str>
      <str name="classname">solr.WordBreakSolrSpellChecker</str>
      <str name="field">spell</str>
      <str name="combineWords">true</str>
      <str name="breakWords">true</str>
      <int name="maxChanges">3</int>
       <!--  <int name="minBreakLength">5</int> -->
    </lst>*

    <!-- a spellchecker that uses a different distance measure -->

     *  <lst name="spellchecker">
         <str name="name">jarowinkler</str>
         <str name="field">spell</str>
         <str name="classname">solr.DirectSolrSpellChecker</str>
         <str
name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
       </lst>*



    <!-- a spellchecker that use an alternate comparator

         comparatorClass be one of:
          1. score (default)
          2. freq (Frequency first, then score)
          3. A fully qualified class name
      -->
    <!--
       <lst name="spellchecker">
         <str name="name">freq</str>
         <str name="field">lowerfilt</str>
         <str name="classname">solr.DirectSolrSpellChecker</str>
         <str name="comparatorClass">freq</str>
      -->

    <!-- A spellchecker that reads the list of words from a file -->

     <!--  <lst name="spellchecker">
         <str name="classname">solr.FileBasedSpellChecker</str>
         <str name="name">file</str>
         <str name="sourceLocation">spellings.txt</str>
         <str name="characterEncoding">UTF-8</str>
         <str name="spellcheckIndexDir">./spellcheckerFile</str>
       </lst>
     -->
     <!-- This field type's analyzer is used by the QueryConverter to
tokenize the value for "q" parameter -->
   *    <str name="queryAnalyzerFieldType">tSpell</str>
  </searchComponent>*


 <!--
    The SpellingQueryConverter to convert raw (CommonParams.Q) queries into
tokens.  Uses a simple regular expression
    to strip off field markup, boosts, ranges, etc. but it is not
guaranteed to match an exact parse from the query parser.

    Optional, defaults to solr.SpellingQueryConverter
   -->

* <queryConverter name="queryConverter"
class="solr.SpellingQueryConverter"/>*

  <!-- A request handler for demonstrating the spellcheck component.

       NOTE: This is purely as an example.  The whole purpose of the
       SpellCheckComponent is to hook it into the request handler that
       handles your normal user queries so that a separate request is
       not needed to get suggestions.

       IN OTHER WORDS, THERE IS REALLY GOOD CHANCE THE SETUP BELOW IS
       NOT WHAT YOU WANT FOR YOUR PRODUCTION SYSTEM!

       See http://wiki.apache.org/solr/SpellCheckComponent for details
       on the request parameters.
    -->
  *<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="df">spell</str>
      <!-- Solr will use suggestions from both the 'default' spellchecker
           and from the 'wordbreak' spellchecker and combine them.
           collations (re-written queries) can include a combination of
           corrections from both spellcheckers -->
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.dictionary">wordbreak</str>
         <!--<str name="spellcheck.dictionary">jarowinkler</str> -->
     <!-- <str name="spellcheck.dictionary">file</str> -->
      <!-- omp = Only More Popular -->
      <str name="spellcheck.onlyMorePopular">false</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>
*



thanks in advance
regards
Rohan

solr spell suggestions help

Reply via email to