How to get only the best collations whose hits are more and need to sort
them?

On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wrote:

> Hi Nitin,
>
> I was trying many different options for a couple different queries.   In
> fact, I have collations working ok now with the Suggester and WFSTLookup.
>  The problem may have been due to a different dictionary and/or lookup
> implementation and the specific options I was sending.
>
> In general, we're using spellcheck for search suggestions.   The Suggester
> component (vs. Suggester spellcheck implementation), doesn't handle all of
> our cases.  But we can get things working using the spellcheck interface.
> What gives us particular troubles are the cases where a term may be valid
> by itself, but also be the start of longer words.
>
> The specific terms are acronyms specific to our business.   But I'll
> attempt to show generic examples.
>
> E.g. a partial term like "fo" can expand to fox, fog, etc. and a full term
> like brown can also expand to something like brownstone.   And, yes, the
> collation "brownstone fox" is nonsense.  But assume, for the sake of
> argument, it appears in our documents somewhere.
>
> For multiple term query with a spelling error (or partially typed term):
> brown fo
>
> We get collations in order of hits, descending like ...
> "brown fox",
> "brown fog",
> "brownstone fox".
>
> So far, so good.
>
> For a single term query, brown, we get a single suggestion, brownstone and
> no collations.
>
> So, we don't know to keep the term brown!
>
> At this point, we need spellcheck.extendedResults=true and look at the
> origFreq value in the suggested corrections.  Unfortunately, the Suggester
> (spellcheck dictionary) does not populate the original frequency
> information.  And, without this information, the SpellCheckComponent cannot
> format the extended results.
>
> However, with a simple change to Suggester.java, it was easy to get the
> needed frequency information use it to make a sound decision to keep or
> drop the input term.   But I'd be much obliged if there is a better way to
> go about it.
>
> Configs below.
>
> Thanks,
> Charlie
>
> <!-- SpellCheck component -->
>   <searchComponent class="solr.SpellCheckComponent" name="suggestSC">
>     <lst name="spellchecker">
>       <str name="name">suggestDictionary</str>
>       <str
> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>       <str
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
>       <str name="field">text_all</str>
>       <float name="threshold">0.00000001</float>
>       <str name="exactMatchFirst">true</str>
>       <str name="buildOnCommit">true</str>
>     </lst>
>   </searchComponent>
>
> <!-- Request Handler -->
> <requestHandler name="/tcSuggest" class="solr.SearchHandler">
>   <lst name="defaults">
>     <str name="title">Search Suggestions (spellcheck)</str>
>     <str name="echoParams">explicit</str>
>     <str name="wt">json</str>
>     <str name="rows">0</str>
>     <str name="defType">edismax</str>
>     <str name="df">text_all</str>
>     <str
> name="fl">id,name,ticker,entityType,transactionType,accountType</str>
>     <str name="spellcheck">true</str>
>     <str name="spellcheck.count">5</str>
>     <str name="spellcheck.dictionary">suggestDictionary</str>
>     <str name="spellcheck.alternativeTermCount">5</str>
>     <str name="spellcheck.collate">true</str>
>     <str name="spellcheck.extendedResults">true</str>
>     <str name="spellcheck.maxCollationTries">10</str>
>     <str name="spellcheck.maxCollations">5</str>
>   </lst>
>   <arr name="last-components">
>     <str>suggestSC</str>
>   </arr>
> </requestHandler>
>
> -----Original Message-----
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 3:17 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Collations are not working fine.
>
> Hi Charles,
>                  Will you please send the configuration which you tried.
> It will help to solve my problem. Have you sorted the collations on hits or
> frequencies of suggestions? If you did than please assist me.
>
> On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles <
> charles.reit...@tiaa-cref.org> wrote:
>
> > I have been working with collations the last couple days and I kept
> adding
> > the collation-related parameters until it started working for me.   It
> > seems I needed <str name="spellcheck.collateMaxCollectDocs">50</str>.
> >
> > But, I am using the Suggester with the WFSTLookupFactory.
> >
> > Also, I needed to patch the suggester to get frequency information in
> > the spellcheck response.
> >
> > -----Original Message-----
> > From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
> > Sent: Friday, February 13, 2015 3:48 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi Nitin,
> >
> > Can u try with the below config, we have these config seems to be
> > working for us.
> >
> > <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> >
> >      <str name="queryAnalyzerFieldType">text_general</str>
> >
> >
> >   <lst name="spellchecker">
> > <str name="name">wordbreak</str>
> > <str name="classname">solr.WordBreakSolrSpellChecker</str>
> > <str name="field">textSpell</str>
> > <str name="combineWords">true</str>
> > <str name="breakWords">false</str>
> > <int name="maxChanges">5</int>
> >   </lst>
> >
> >    <lst name="spellchecker">
> > <str name="name">default</str>
> > <str name="field">textSpell</str>
> > <str name="classname">solr.IndexBasedSpellChecker</str>
> > <str name="spellcheckIndexDir">./spellchecker</str>
> > <str name="accuracy">0.75</str>
> > <float name="thresholdTokenFrequency">0.01</float>
> > <str name="buildOnCommit">true</str>
> > <str name="spellcheck.maxResultsForSuggest">5</str>
> >      </lst>
> >
> >
> >   </searchComponent>
> >
> >
> >
> > <str name="spellcheck">true</str>
> > <str name="spellcheck.dictionary">default</str>
> > <str name="spellcheck.dictionary">wordbreak</str>
> > <int name="spellcheck.count">5</int>
> > <str name="spellcheck.alternativeTermCount">15</str>
> > <str name="spellcheck.collate">true</str>
> > <str name="spellcheck.onlyMorePopular">false</str>
> > <str name="spellcheck.extendedResults">true</str>
> > <str name ="spellcheck.maxCollations">100</str>
> > <str name="spellcheck.collateParam.mm">100%</str>
> > <str name="spellcheck.collateParam.q.op">AND</str>
> > <str name="spellcheck.maxCollationTries">1000</str>
> >
> >
> > *Rajesh.*
> >
> > On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James
> > <james.d...@ingramcontent.com
> > >
> > wrote:
> >
> > > Nitin,
> > >
> > > Can you post the full spellcheck response when you query:
> > >
> > > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> > >
> > > James Dyer
> > > Ingram Content Group
> > >
> > >
> > > -----Original Message-----
> > > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > > Sent: Friday, February 13, 2015 1:05 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Collations are not working fine.
> > >
> > > Hi James Dyer,
> > >                           I did the same as you told me. Used
> > > WordBreakSolrSpellChecker instead of shingles. But still collations
> > > are not coming or working.
> > > For instance, I tried to get collation of "gone with the wind" by
> > > searching "gone wthh thes wint" on field=gram_ci but didn't succeed.
> > > Even, I am getting the suggestions of wtth as *with*, thes as *the*,
> > wint as *wind*.
> > > Also I have documents which contains "gone with the wind" having 167
> > > times in the documents. I don't know that I am missing something or
> not.
> > > Please check my below solr configuration:
> > >
> > > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes
> > > wint"&wt=json&indent=true&shards.qt=/spell
> > >
> > > *solrconfig.xml:*
> > >
> > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> > >     <str name="queryAnalyzerFieldType">textSpellCi</str>
> > >     <lst name="spellchecker">
> > >       <str name="name">default</str>
> > >       <str name="field">gram_ci</str>
> > >       <str name="classname">solr.DirectSolrSpellChecker</str>
> > >       <str name="distanceMeasure">internal</str>
> > >       <float name="accuracy">0.5</float>
> > >       <int name="maxEdits">2</int>
> > >       <int name="minPrefix">0</int>
> > >       <int name="maxInspections">5</int>
> > >       <int name="minQueryLength">2</int>
> > >       <float name="maxQueryFrequency">0.9</float>
> > >       <str name="comparatorClass">freq</str>
> > >     </lst>
> > > <lst name="spellchecker">
> > >       <str name="name">wordbreak</str>
> > >       <str name="classname">solr.WordBreakSolrSpellChecker</str>
> > >       <str name="field">gram</str>
> > >       <str name="combineWords">true</str>
> > >       <str name="breakWords">true</str>
> > >       <int name="maxChanges">5</int>
> > >     </lst>
> > > </searchComponent>
> > >
> > > <requestHandler name="/spell" class="solr.SearchHandler"
> startup="lazy">
> > >     <lst name="defaults">
> > >       <str name="df">gram_ci</str>
> > >       <str name="spellcheck.dictionary">default</str>
> > >       <str name="spellcheck">on</str>
> > >       <str name="spellcheck.extendedResults">true</str>
> > >       <str name="spellcheck.count">25</str>
> > >       <str name="spellcheck.onlyMorePopular">true</str>
> > >       <str name="spellcheck.maxResultsForSuggest">100000000</str>
> > >       <str name="spellcheck.alternativeTermCount">25</str>
> > >       <str name="spellcheck.collate">true</str>
> > >       <str name="spellcheck.maxCollations">50</str>
> > >       <str name="spellcheck.maxCollationTries">50</str>
> > >       <str name="spellcheck.collateExtendedResults">true</str>
> > >     </lst>
> > >     <arr name="last-components">
> > >       <str>spellcheck</str>
> > >     </arr>
> > >   </requestHandler>
> > >
> > > *Schema.xml: *
> > >
> > > <field name="gram_ci" type="textSpellCi" indexed="true" stored="true"
> > > multiValued="false"/>
> > >
> > > </fieldType><fieldType name="textSpellCi" class="solr.TextField"
> > > positionIncrementGap="100">
> > >        <analyzer type="index">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > >     <analyzer type="query">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > > </fieldType>
> > >
> >
> > **********************************************************************
> > *** This e-mail may contain confidential or privileged information.
> > If you are not the intended recipient, please notify the sender
> > immediately and then delete it.
> >
> > TIAA-CREF
> > **********************************************************************
> > ***
> >
>
> *************************************************************************
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA-CREF
> *************************************************************************
>

Reply via email to