Hi,

we have used spellcheck component the below configs to get a best collation
(exact collation) when a query has either single term or multiple terms.

As charles, mentioned above we do have a check for getOriginalFrequency()
for each term in our service before we send spellcheck response to client,
this may not be the case for you, hope this helps

<request-handler name="/select" class="solr.SearchHandler">
    <!-- default values for query parameters can be specified, these
         will be overridden by parameters in the request
      -->
                <lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">100</int>
<str name="df">textSpell</str>
                 <str name="spellcheck">true</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<int name="spellcheck.count">5</int>
* <str name="spellcheck.alternativeTermCount">15</str> *
* <str name="spellcheck.collate">true</str>*
* <str name="spellcheck.onlyMorePopular">false</str>*
* <str name="spellcheck.extendedResults">true</str>*
* <str name ="spellcheck.maxCollations">100</str>*
* <str name="spellcheck.collateParam.mm
<http://spellcheck.collateParam.mm>">100%</str>*
* <str name="spellcheck.collateParam.q.op">AND</str>*
* <str name="spellcheck.maxCollationTries">1000</str>*
<str name="q.op">OR</str>
.
.
..   </lst> </request-handler>
.
.
.

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">

 <lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">textSpell</str>
<str name="combineWords">true</str>
<str name="breakWords">false</str>
<int name="maxChanges">5</int>
  </lst>

   <lst name="spellchecker">
<str name="name">default</str>
<str name="field">textSpell</str>
<str name="classname">solr.IndexBasedSpellChecker</str>
<!-- <str name="classname">solr.DirectSolrSpellChecker</str> -->
<str name="spellcheckIndexDir">./spellchecker</str>
<!-- <str
name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>-->
<str name="accuracy">0.75</str>
<float name="thresholdTokenFrequency">0.01</float>
<str name="buildOnCommit">true</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
     </lst>


  </searchComponent>



*Rajesh**.*

On Fri, Feb 20, 2015 at 8:42 AM, Nitin Solanki <nitinml...@gmail.com> wrote:

> How to get only the best collations whose hits are more and need to sort
> them?
>
> On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles <
> charles.reit...@tiaa-cref.org> wrote:
>
> > Hi Nitin,
> >
> > I was trying many different options for a couple different queries.   In
> > fact, I have collations working ok now with the Suggester and WFSTLookup.
> >  The problem may have been due to a different dictionary and/or lookup
> > implementation and the specific options I was sending.
> >
> > In general, we're using spellcheck for search suggestions.   The
> Suggester
> > component (vs. Suggester spellcheck implementation), doesn't handle all
> of
> > our cases.  But we can get things working using the spellcheck interface.
> > What gives us particular troubles are the cases where a term may be valid
> > by itself, but also be the start of longer words.
> >
> > The specific terms are acronyms specific to our business.   But I'll
> > attempt to show generic examples.
> >
> > E.g. a partial term like "fo" can expand to fox, fog, etc. and a full
> term
> > like brown can also expand to something like brownstone.   And, yes, the
> > collation "brownstone fox" is nonsense.  But assume, for the sake of
> > argument, it appears in our documents somewhere.
> >
> > For multiple term query with a spelling error (or partially typed term):
> > brown fo
> >
> > We get collations in order of hits, descending like ...
> > "brown fox",
> > "brown fog",
> > "brownstone fox".
> >
> > So far, so good.
> >
> > For a single term query, brown, we get a single suggestion, brownstone
> and
> > no collations.
> >
> > So, we don't know to keep the term brown!
> >
> > At this point, we need spellcheck.extendedResults=true and look at the
> > origFreq value in the suggested corrections.  Unfortunately, the
> Suggester
> > (spellcheck dictionary) does not populate the original frequency
> > information.  And, without this information, the SpellCheckComponent
> cannot
> > format the extended results.
> >
> > However, with a simple change to Suggester.java, it was easy to get the
> > needed frequency information use it to make a sound decision to keep or
> > drop the input term.   But I'd be much obliged if there is a better way
> to
> > go about it.
> >
> > Configs below.
> >
> > Thanks,
> > Charlie
> >
> > <!-- SpellCheck component -->
> >   <searchComponent class="solr.SpellCheckComponent" name="suggestSC">
> >     <lst name="spellchecker">
> >       <str name="name">suggestDictionary</str>
> >       <str
> > name="classname">org.apache.solr.spelling.suggest.Suggester</str>
> >       <str
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
> >       <str name="field">text_all</str>
> >       <float name="threshold">0.00000001</float>
> >       <str name="exactMatchFirst">true</str>
> >       <str name="buildOnCommit">true</str>
> >     </lst>
> >   </searchComponent>
> >
> > <!-- Request Handler -->
> > <requestHandler name="/tcSuggest" class="solr.SearchHandler">
> >   <lst name="defaults">
> >     <str name="title">Search Suggestions (spellcheck)</str>
> >     <str name="echoParams">explicit</str>
> >     <str name="wt">json</str>
> >     <str name="rows">0</str>
> >     <str name="defType">edismax</str>
> >     <str name="df">text_all</str>
> >     <str
> > name="fl">id,name,ticker,entityType,transactionType,accountType</str>
> >     <str name="spellcheck">true</str>
> >     <str name="spellcheck.count">5</str>
> >     <str name="spellcheck.dictionary">suggestDictionary</str>
> >     <str name="spellcheck.alternativeTermCount">5</str>
> >     <str name="spellcheck.collate">true</str>
> >     <str name="spellcheck.extendedResults">true</str>
> >     <str name="spellcheck.maxCollationTries">10</str>
> >     <str name="spellcheck.maxCollations">5</str>
> >   </lst>
> >   <arr name="last-components">
> >     <str>suggestSC</str>
> >   </arr>
> > </requestHandler>
> >
> > -----Original Message-----
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Tuesday, February 17, 2015 3:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi Charles,
> >                  Will you please send the configuration which you tried.
> > It will help to solve my problem. Have you sorted the collations on hits
> or
> > frequencies of suggestions? If you did than please assist me.
> >
> > On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles <
> > charles.reit...@tiaa-cref.org> wrote:
> >
> > > I have been working with collations the last couple days and I kept
> > adding
> > > the collation-related parameters until it started working for me.   It
> > > seems I needed <str name="spellcheck.collateMaxCollectDocs">50</str>.
> > >
> > > But, I am using the Suggester with the WFSTLookupFactory.
> > >
> > > Also, I needed to patch the suggester to get frequency information in
> > > the spellcheck response.
> > >
> > > -----Original Message-----
> > > From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
> > > Sent: Friday, February 13, 2015 3:48 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Collations are not working fine.
> > >
> > > Hi Nitin,
> > >
> > > Can u try with the below config, we have these config seems to be
> > > working for us.
> > >
> > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> > >
> > >      <str name="queryAnalyzerFieldType">text_general</str>
> > >
> > >
> > >   <lst name="spellchecker">
> > > <str name="name">wordbreak</str>
> > > <str name="classname">solr.WordBreakSolrSpellChecker</str>
> > > <str name="field">textSpell</str>
> > > <str name="combineWords">true</str>
> > > <str name="breakWords">false</str>
> > > <int name="maxChanges">5</int>
> > >   </lst>
> > >
> > >    <lst name="spellchecker">
> > > <str name="name">default</str>
> > > <str name="field">textSpell</str>
> > > <str name="classname">solr.IndexBasedSpellChecker</str>
> > > <str name="spellcheckIndexDir">./spellchecker</str>
> > > <str name="accuracy">0.75</str>
> > > <float name="thresholdTokenFrequency">0.01</float>
> > > <str name="buildOnCommit">true</str>
> > > <str name="spellcheck.maxResultsForSuggest">5</str>
> > >      </lst>
> > >
> > >
> > >   </searchComponent>
> > >
> > >
> > >
> > > <str name="spellcheck">true</str>
> > > <str name="spellcheck.dictionary">default</str>
> > > <str name="spellcheck.dictionary">wordbreak</str>
> > > <int name="spellcheck.count">5</int>
> > > <str name="spellcheck.alternativeTermCount">15</str>
> > > <str name="spellcheck.collate">true</str>
> > > <str name="spellcheck.onlyMorePopular">false</str>
> > > <str name="spellcheck.extendedResults">true</str>
> > > <str name ="spellcheck.maxCollations">100</str>
> > > <str name="spellcheck.collateParam.mm">100%</str>
> > > <str name="spellcheck.collateParam.q.op">AND</str>
> > > <str name="spellcheck.maxCollationTries">1000</str>
> > >
> > >
> > > *Rajesh.*
> > >
> > > On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James
> > > <james.d...@ingramcontent.com
> > > >
> > > wrote:
> > >
> > > > Nitin,
> > > >
> > > > Can you post the full spellcheck response when you query:
> > > >
> > > > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> > > >
> > > > James Dyer
> > > > Ingram Content Group
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > > > Sent: Friday, February 13, 2015 1:05 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Collations are not working fine.
> > > >
> > > > Hi James Dyer,
> > > >                           I did the same as you told me. Used
> > > > WordBreakSolrSpellChecker instead of shingles. But still collations
> > > > are not coming or working.
> > > > For instance, I tried to get collation of "gone with the wind" by
> > > > searching "gone wthh thes wint" on field=gram_ci but didn't succeed.
> > > > Even, I am getting the suggestions of wtth as *with*, thes as *the*,
> > > wint as *wind*.
> > > > Also I have documents which contains "gone with the wind" having 167
> > > > times in the documents. I don't know that I am missing something or
> > not.
> > > > Please check my below solr configuration:
> > > >
> > > > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes
> > > > wint"&wt=json&indent=true&shards.qt=/spell
> > > >
> > > > *solrconfig.xml:*
> > > >
> > > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> > > >     <str name="queryAnalyzerFieldType">textSpellCi</str>
> > > >     <lst name="spellchecker">
> > > >       <str name="name">default</str>
> > > >       <str name="field">gram_ci</str>
> > > >       <str name="classname">solr.DirectSolrSpellChecker</str>
> > > >       <str name="distanceMeasure">internal</str>
> > > >       <float name="accuracy">0.5</float>
> > > >       <int name="maxEdits">2</int>
> > > >       <int name="minPrefix">0</int>
> > > >       <int name="maxInspections">5</int>
> > > >       <int name="minQueryLength">2</int>
> > > >       <float name="maxQueryFrequency">0.9</float>
> > > >       <str name="comparatorClass">freq</str>
> > > >     </lst>
> > > > <lst name="spellchecker">
> > > >       <str name="name">wordbreak</str>
> > > >       <str name="classname">solr.WordBreakSolrSpellChecker</str>
> > > >       <str name="field">gram</str>
> > > >       <str name="combineWords">true</str>
> > > >       <str name="breakWords">true</str>
> > > >       <int name="maxChanges">5</int>
> > > >     </lst>
> > > > </searchComponent>
> > > >
> > > > <requestHandler name="/spell" class="solr.SearchHandler"
> > startup="lazy">
> > > >     <lst name="defaults">
> > > >       <str name="df">gram_ci</str>
> > > >       <str name="spellcheck.dictionary">default</str>
> > > >       <str name="spellcheck">on</str>
> > > >       <str name="spellcheck.extendedResults">true</str>
> > > >       <str name="spellcheck.count">25</str>
> > > >       <str name="spellcheck.onlyMorePopular">true</str>
> > > >       <str name="spellcheck.maxResultsForSuggest">100000000</str>
> > > >       <str name="spellcheck.alternativeTermCount">25</str>
> > > >       <str name="spellcheck.collate">true</str>
> > > >       <str name="spellcheck.maxCollations">50</str>
> > > >       <str name="spellcheck.maxCollationTries">50</str>
> > > >       <str name="spellcheck.collateExtendedResults">true</str>
> > > >     </lst>
> > > >     <arr name="last-components">
> > > >       <str>spellcheck</str>
> > > >     </arr>
> > > >   </requestHandler>
> > > >
> > > > *Schema.xml: *
> > > >
> > > > <field name="gram_ci" type="textSpellCi" indexed="true" stored="true"
> > > > multiValued="false"/>
> > > >
> > > > </fieldType><fieldType name="textSpellCi" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >        <analyzer type="index">
> > > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > > </analyzer>
> > > >     <analyzer type="query">
> > > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > > </analyzer>
> > > > </fieldType>
> > > >
> > >
> > > **********************************************************************
> > > *** This e-mail may contain confidential or privileged information.
> > > If you are not the intended recipient, please notify the sender
> > > immediately and then delete it.
> > >
> > > TIAA-CREF
> > > **********************************************************************
> > > ***
> > >
> >
> > *************************************************************************
> > This e-mail may contain confidential or privileged information.
> > If you are not the intended recipient, please notify the sender
> > immediately and then delete it.
> >
> > TIAA-CREF
> > *************************************************************************
> >
>

Reply via email to