Hi, we have used spellcheck component the below configs to get a best collation (exact collation) when a query has either single term or multiple terms.
As charles, mentioned above we do have a check for getOriginalFrequency() for each term in our service before we send spellcheck response to client, this may not be the case for you, hope this helps <request-handler name="/select" class="solr.SearchHandler"> <!-- default values for query parameters can be specified, these will be overridden by parameters in the request --> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">100</int> <str name="df">textSpell</str> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">default</str> <str name="spellcheck.dictionary">wordbreak</str> <int name="spellcheck.count">5</int> * <str name="spellcheck.alternativeTermCount">15</str> * * <str name="spellcheck.collate">true</str>* * <str name="spellcheck.onlyMorePopular">false</str>* * <str name="spellcheck.extendedResults">true</str>* * <str name ="spellcheck.maxCollations">100</str>* * <str name="spellcheck.collateParam.mm <http://spellcheck.collateParam.mm>">100%</str>* * <str name="spellcheck.collateParam.q.op">AND</str>* * <str name="spellcheck.maxCollationTries">1000</str>* <str name="q.op">OR</str> . . .. </lst> </request-handler> . . . <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">wordbreak</str> <str name="classname">solr.WordBreakSolrSpellChecker</str> <str name="field">textSpell</str> <str name="combineWords">true</str> <str name="breakWords">false</str> <int name="maxChanges">5</int> </lst> <lst name="spellchecker"> <str name="name">default</str> <str name="field">textSpell</str> <str name="classname">solr.IndexBasedSpellChecker</str> <!-- <str name="classname">solr.DirectSolrSpellChecker</str> --> <str name="spellcheckIndexDir">./spellchecker</str> <!-- <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>--> <str name="accuracy">0.75</str> <float name="thresholdTokenFrequency">0.01</float> <str name="buildOnCommit">true</str> <str name="spellcheck.maxResultsForSuggest">5</str> </lst> </searchComponent> *Rajesh**.* On Fri, Feb 20, 2015 at 8:42 AM, Nitin Solanki <nitinml...@gmail.com> wrote: > How to get only the best collations whose hits are more and need to sort > them? > > On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles < > charles.reit...@tiaa-cref.org> wrote: > > > Hi Nitin, > > > > I was trying many different options for a couple different queries. In > > fact, I have collations working ok now with the Suggester and WFSTLookup. > > The problem may have been due to a different dictionary and/or lookup > > implementation and the specific options I was sending. > > > > In general, we're using spellcheck for search suggestions. The > Suggester > > component (vs. Suggester spellcheck implementation), doesn't handle all > of > > our cases. But we can get things working using the spellcheck interface. > > What gives us particular troubles are the cases where a term may be valid > > by itself, but also be the start of longer words. > > > > The specific terms are acronyms specific to our business. But I'll > > attempt to show generic examples. > > > > E.g. a partial term like "fo" can expand to fox, fog, etc. and a full > term > > like brown can also expand to something like brownstone. And, yes, the > > collation "brownstone fox" is nonsense. But assume, for the sake of > > argument, it appears in our documents somewhere. > > > > For multiple term query with a spelling error (or partially typed term): > > brown fo > > > > We get collations in order of hits, descending like ... > > "brown fox", > > "brown fog", > > "brownstone fox". > > > > So far, so good. > > > > For a single term query, brown, we get a single suggestion, brownstone > and > > no collations. > > > > So, we don't know to keep the term brown! > > > > At this point, we need spellcheck.extendedResults=true and look at the > > origFreq value in the suggested corrections. Unfortunately, the > Suggester > > (spellcheck dictionary) does not populate the original frequency > > information. And, without this information, the SpellCheckComponent > cannot > > format the extended results. > > > > However, with a simple change to Suggester.java, it was easy to get the > > needed frequency information use it to make a sound decision to keep or > > drop the input term. But I'd be much obliged if there is a better way > to > > go about it. > > > > Configs below. > > > > Thanks, > > Charlie > > > > <!-- SpellCheck component --> > > <searchComponent class="solr.SpellCheckComponent" name="suggestSC"> > > <lst name="spellchecker"> > > <str name="name">suggestDictionary</str> > > <str > > name="classname">org.apache.solr.spelling.suggest.Suggester</str> > > <str > > > name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str> > > <str name="field">text_all</str> > > <float name="threshold">0.00000001</float> > > <str name="exactMatchFirst">true</str> > > <str name="buildOnCommit">true</str> > > </lst> > > </searchComponent> > > > > <!-- Request Handler --> > > <requestHandler name="/tcSuggest" class="solr.SearchHandler"> > > <lst name="defaults"> > > <str name="title">Search Suggestions (spellcheck)</str> > > <str name="echoParams">explicit</str> > > <str name="wt">json</str> > > <str name="rows">0</str> > > <str name="defType">edismax</str> > > <str name="df">text_all</str> > > <str > > name="fl">id,name,ticker,entityType,transactionType,accountType</str> > > <str name="spellcheck">true</str> > > <str name="spellcheck.count">5</str> > > <str name="spellcheck.dictionary">suggestDictionary</str> > > <str name="spellcheck.alternativeTermCount">5</str> > > <str name="spellcheck.collate">true</str> > > <str name="spellcheck.extendedResults">true</str> > > <str name="spellcheck.maxCollationTries">10</str> > > <str name="spellcheck.maxCollations">5</str> > > </lst> > > <arr name="last-components"> > > <str>suggestSC</str> > > </arr> > > </requestHandler> > > > > -----Original Message----- > > From: Nitin Solanki [mailto:nitinml...@gmail.com] > > Sent: Tuesday, February 17, 2015 3:17 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Collations are not working fine. > > > > Hi Charles, > > Will you please send the configuration which you tried. > > It will help to solve my problem. Have you sorted the collations on hits > or > > frequencies of suggestions? If you did than please assist me. > > > > On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles < > > charles.reit...@tiaa-cref.org> wrote: > > > > > I have been working with collations the last couple days and I kept > > adding > > > the collation-related parameters until it started working for me. It > > > seems I needed <str name="spellcheck.collateMaxCollectDocs">50</str>. > > > > > > But, I am using the Suggester with the WFSTLookupFactory. > > > > > > Also, I needed to patch the suggester to get frequency information in > > > the spellcheck response. > > > > > > -----Original Message----- > > > From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] > > > Sent: Friday, February 13, 2015 3:48 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Collations are not working fine. > > > > > > Hi Nitin, > > > > > > Can u try with the below config, we have these config seems to be > > > working for us. > > > > > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> > > > > > > <str name="queryAnalyzerFieldType">text_general</str> > > > > > > > > > <lst name="spellchecker"> > > > <str name="name">wordbreak</str> > > > <str name="classname">solr.WordBreakSolrSpellChecker</str> > > > <str name="field">textSpell</str> > > > <str name="combineWords">true</str> > > > <str name="breakWords">false</str> > > > <int name="maxChanges">5</int> > > > </lst> > > > > > > <lst name="spellchecker"> > > > <str name="name">default</str> > > > <str name="field">textSpell</str> > > > <str name="classname">solr.IndexBasedSpellChecker</str> > > > <str name="spellcheckIndexDir">./spellchecker</str> > > > <str name="accuracy">0.75</str> > > > <float name="thresholdTokenFrequency">0.01</float> > > > <str name="buildOnCommit">true</str> > > > <str name="spellcheck.maxResultsForSuggest">5</str> > > > </lst> > > > > > > > > > </searchComponent> > > > > > > > > > > > > <str name="spellcheck">true</str> > > > <str name="spellcheck.dictionary">default</str> > > > <str name="spellcheck.dictionary">wordbreak</str> > > > <int name="spellcheck.count">5</int> > > > <str name="spellcheck.alternativeTermCount">15</str> > > > <str name="spellcheck.collate">true</str> > > > <str name="spellcheck.onlyMorePopular">false</str> > > > <str name="spellcheck.extendedResults">true</str> > > > <str name ="spellcheck.maxCollations">100</str> > > > <str name="spellcheck.collateParam.mm">100%</str> > > > <str name="spellcheck.collateParam.q.op">AND</str> > > > <str name="spellcheck.maxCollationTries">1000</str> > > > > > > > > > *Rajesh.* > > > > > > On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James > > > <james.d...@ingramcontent.com > > > > > > > wrote: > > > > > > > Nitin, > > > > > > > > Can you post the full spellcheck response when you query: > > > > > > > > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell > > > > > > > > James Dyer > > > > Ingram Content Group > > > > > > > > > > > > -----Original Message----- > > > > From: Nitin Solanki [mailto:nitinml...@gmail.com] > > > > Sent: Friday, February 13, 2015 1:05 AM > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: Collations are not working fine. > > > > > > > > Hi James Dyer, > > > > I did the same as you told me. Used > > > > WordBreakSolrSpellChecker instead of shingles. But still collations > > > > are not coming or working. > > > > For instance, I tried to get collation of "gone with the wind" by > > > > searching "gone wthh thes wint" on field=gram_ci but didn't succeed. > > > > Even, I am getting the suggestions of wtth as *with*, thes as *the*, > > > wint as *wind*. > > > > Also I have documents which contains "gone with the wind" having 167 > > > > times in the documents. I don't know that I am missing something or > > not. > > > > Please check my below solr configuration: > > > > > > > > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes > > > > wint"&wt=json&indent=true&shards.qt=/spell > > > > > > > > *solrconfig.xml:* > > > > > > > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> > > > > <str name="queryAnalyzerFieldType">textSpellCi</str> > > > > <lst name="spellchecker"> > > > > <str name="name">default</str> > > > > <str name="field">gram_ci</str> > > > > <str name="classname">solr.DirectSolrSpellChecker</str> > > > > <str name="distanceMeasure">internal</str> > > > > <float name="accuracy">0.5</float> > > > > <int name="maxEdits">2</int> > > > > <int name="minPrefix">0</int> > > > > <int name="maxInspections">5</int> > > > > <int name="minQueryLength">2</int> > > > > <float name="maxQueryFrequency">0.9</float> > > > > <str name="comparatorClass">freq</str> > > > > </lst> > > > > <lst name="spellchecker"> > > > > <str name="name">wordbreak</str> > > > > <str name="classname">solr.WordBreakSolrSpellChecker</str> > > > > <str name="field">gram</str> > > > > <str name="combineWords">true</str> > > > > <str name="breakWords">true</str> > > > > <int name="maxChanges">5</int> > > > > </lst> > > > > </searchComponent> > > > > > > > > <requestHandler name="/spell" class="solr.SearchHandler" > > startup="lazy"> > > > > <lst name="defaults"> > > > > <str name="df">gram_ci</str> > > > > <str name="spellcheck.dictionary">default</str> > > > > <str name="spellcheck">on</str> > > > > <str name="spellcheck.extendedResults">true</str> > > > > <str name="spellcheck.count">25</str> > > > > <str name="spellcheck.onlyMorePopular">true</str> > > > > <str name="spellcheck.maxResultsForSuggest">100000000</str> > > > > <str name="spellcheck.alternativeTermCount">25</str> > > > > <str name="spellcheck.collate">true</str> > > > > <str name="spellcheck.maxCollations">50</str> > > > > <str name="spellcheck.maxCollationTries">50</str> > > > > <str name="spellcheck.collateExtendedResults">true</str> > > > > </lst> > > > > <arr name="last-components"> > > > > <str>spellcheck</str> > > > > </arr> > > > > </requestHandler> > > > > > > > > *Schema.xml: * > > > > > > > > <field name="gram_ci" type="textSpellCi" indexed="true" stored="true" > > > > multiValued="false"/> > > > > > > > > </fieldType><fieldType name="textSpellCi" class="solr.TextField" > > > > positionIncrementGap="100"> > > > > <analyzer type="index"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > </analyzer> > > > > <analyzer type="query"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > </analyzer> > > > > </fieldType> > > > > > > > > > > ********************************************************************** > > > *** This e-mail may contain confidential or privileged information. > > > If you are not the intended recipient, please notify the sender > > > immediately and then delete it. > > > > > > TIAA-CREF > > > ********************************************************************** > > > *** > > > > > > > ************************************************************************* > > This e-mail may contain confidential or privileged information. > > If you are not the intended recipient, please notify the sender > > immediately and then delete it. > > > > TIAA-CREF > > ************************************************************************* > > >