Re: Spellcheck field element and collation issues

Brendan Grainger Tue, 23 Jul 2013 14:42:23 -0700

Thanks James. That's it! Now:

http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0


returns:

<lst name="collation">
<str name="collationQuery">perform hvac</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">perform</str>
<str name="hvc">hvac</str>
</lst>
</lst>
<lst name="collation">
<str name="collationQuery">performed hvac</str>
<int name="hits">4</int>
<lst name="misspellingsAndCorrections">
<str name="perfrm">performed</str>
<str name="hvc">hvac</str>
</lst>
</lst>

If you have time, I'm still slightly unclear on the field element in the
spellcheck configuration. Maybe I should explain how I think it works:

1. You create a relatively unanalyzed field type (e.g. no stemming)
2. You copy text you want to be used to build the spellcheck index into
that field.
3. Build the spellcheck sidecar index (or noop if using DirectSpellChecker
in which case I assume it still uses the dedicated spellcheck field text
was copied into).

When executing a spellcheck request, solr uses the analyzer specified in
queryAnalyzerFieldType to tokenize the query passed in via the q or
spellcheck.q parameter and this tokenized text is the input the
spellcheckchecking instance.

Does that sound right?

Thanks
Brendan







On Tue, Jul 23, 2013 at 5:15 PM, Dyer, James
<james.d...@ingramcontent.com>wrote:

> I don't believe you can specify more than 1 field on "df" (default field).
>  What you want, I think, is "qf" (query fields), which is available only if
> using dismax/edismax.
>
> http://wiki.apache.org/solr/SearchHandler#df
> http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
> Sent: Tuesday, July 23, 2013 3:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck field element and collation issues
>
> Hi James,
>
> If I try:
>
>
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0
>
> I get the same result:
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">7</int>
> <lst name="params">
> <str name="indent">true</str>
> <str name="q">Perfrm HVC</str>
> <str name="maxCollationTries">0</str>
> <str name="rows">0</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0"></result>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="perfrm">
> <int name="numFound">3</int>
> <int name="startOffset">0</int>
> <int name="endOffset">6</int>
> <int name="origFreq">0</int>
> <arr name="suggestion">
> <lst>
> <str name="word">perform</str>
> <int name="freq">4</int>
> </lst>
> <lst>
> <str name="word">performed</str>
> <int name="freq">1</int>
> </lst>
> <lst>
> <str name="word">performance</str>
> <int name="freq">3</int>
> </lst>
> </arr>
> </lst>
> <lst name="hvc">
> <int name="numFound">2</int>
> <int name="startOffset">7</int>
> <int name="endOffset">10</int>
> <int name="origFreq">0</int>
> <arr name="suggestion">
> <lst>
> <str name="word">hvac</str>
> <int name="freq">4</int>
> </lst>
> <lst>
> <str name="word">have</str>
> <int name="freq">5</int>
> </lst>
> </arr>
> </lst>
> <bool name="correctlySpelled">false</bool>
> </lst>
> </lst>
> </response>
>
> However, you're right that my df field for the /select handler is in fact:
>
>      <str name="df">markup_texts title_texts</str>
>
> I would note that if I specify the query as follows:
>
>
> http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)&rows=0&maxCollationTries=0
>
> which is what I thought specifying a df would effectively do, I get
> collation results:
>
> <lst name="collation">
> <str name="collationQuery">
> markup_texts:(perform hvac) OR title_texts:(perform hvac)
> </str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">perform</str>
> <str name="hvc">hvac</str>
> <str name="perfrm">perform</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">
> markup_texts:(perform hvac) OR title_texts:(performed hvac)
> </str>
> <int name="hits">4</int>
> <lst name="misspellingsAndCorrections">
> <str name="perfrm">perform</str>
> <str name="hvc">hvac</str>
> <str name="perfrm">performed</str>
> <str name="hvc">hvac</str>
> </lst>
> </lst>
>
> I think I'm confused about the relationship between the q parameter and
> what the field and queryAnalyzerFieldType are for in the spellcheck
> component definition, i.e. what is this for:
>
>    <str name="field">spellcheck</str>
>
> is it even needed if I've specified how the spelling index terms should
> analyzed with:
>
>    <str name="queryAnalyzerFieldType">text_spell</str>
>
> Thanks again
> Brendan
>
>
>
>
>
> On Tue, Jul 23, 2013 at 3:58 PM, Dyer, James
> <james.d...@ingramcontent.com>wrote:
>
> > Try tacking &maxCollationTries=0 to the URL and see if the collation
> > returns.
> >
> > If you get a collation, then try the same URL with the collation as the
> > "q" parameter.  Does that get results?
> >
> > My suspicion here is that you are assuming that "markup_texts" is the
> > default search field for "/select" but in fact it isn't.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
> > Sent: Tuesday, July 23, 2013 2:43 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Spellcheck field element and collation issues
> >
> > Hi James,
> >
> > I get the following response for that query:
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">8</int>
> > <lst name="params">
> > <str name="indent">true</str>
> > <str name="q">Perfrm HVC</str>
> > <str name="rows">0</str>
> > </lst>
> > </lst>
> > <result name="response" numFound="0" start="0"></result>
> > <lst name="spellcheck">
> > <lst name="suggestions">
> > <lst name="perfrm">
> > <int name="numFound">3</int>
> > <int name="startOffset">0</int>
> > <int name="endOffset">6</int>
> > <int name="origFreq">0</int>
> > <arr name="suggestion">
> > <lst>
> > <str name="word">perform</str>
> > <int name="freq">4</int>
> > </lst>
> > <lst>
> > <str name="word">performed</str>
> > <int name="freq">1</int>
> > </lst>
> > <lst>
> > <str name="word">performance</str>
> > <int name="freq">3</int>
> > </lst>
> > </arr>
> > </lst>
> > <lst name="hvc">
> > <int name="numFound">2</int>
> > <int name="startOffset">7</int>
> > <int name="endOffset">10</int>
> > <int name="origFreq">0</int>
> > <arr name="suggestion">
> > <lst>
> > <str name="word">hvac</str>
> > <int name="freq">4</int>
> > </lst>
> > <lst>
> > <str name="word">have</str>
> > <int name="freq">5</int>
> > </lst>
> > </arr>
> > </lst>
> > <bool name="correctlySpelled">false</bool>
> > </lst>
> > </lst>
> > </response>
> >
> > Thanks
> > Brendan
> >
> >
> > On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
> > <james.d...@ingramcontent.com>wrote:
> >
> > > For this query:
> > >
> > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
> > >
> > > ...do you get anything back in the spellcheck response?  Is it
> correcting
> > > the individual words and not giving collations?  Or are you getting no
> > > individual word suggestions also?
> > >
> > > James Dyer
> > > Ingram Content Group
> > > (615) 213-4311
> > >
> > >
> > > -----Original Message-----
> > > From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
> > > Sent: Tuesday, July 23, 2013 1:47 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Spellcheck field element and collation issues
> > >
> > > Hi All,
> > >
> > > I have an IndexBasedSpellChecker component configured as follows (note
> > the
> > > field parameter is set to the spellcheck field):
> > >
> > >   <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> > >
> > >     <str name="queryAnalyzerFieldType">text_spell</str>
> > >
> > >     <lst name="spellchecker">
> > >       <str name="name">default</str>
> > >       <str name="classname">solr.IndexBasedSpellChecker</str>
> > >       <!--
> > >           Load tokens from the following field for spell checking,
> > >           analyzer for the field's type as defined in schema.xml are
> used
> > >       -->
> > > *      <str name="field">spellcheck</str>*
> > >       <str name="spellcheckIndexDir">./spellchecker</str>
> > >       <float name="thresholdTokenFrequency">.0001</float>
> > >     </lst>
> > >   </searchComponent>
> > >
> > > with the corresponding field type for spellcheck:
> > >
> > >     <fieldType name="text_spell" class="solr.TextField"
> > > positionIncrementGap="100" omitNorms="true">
> > >       <analyzer type="index">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.StopFilterFactory"
> > >                 ignoreCase="true"
> > >                 words="lang/stopwords_en.txt"
> > >                 enablePositionIncrements="true"
> > >                 />
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter class="solr.StandardFilterFactory"/>
> > >       </analyzer>
> > >       <analyzer type="query">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.SynonymFilterFactory"
> > > synonyms="moto_synonyms.txt" ignoreCase="true" expand="true"/>
> > >         <filter class="solr.StopFilterFactory"
> > >                 ignoreCase="true"
> > >                 words="lang/stopwords_en.txt"
> > >                 enablePositionIncrements="true"
> > >                 />
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >         <filter class="solr.StandardFilterFactory"/>
> > >       </analyzer>
> > >     </fieldType>
> > >
> > > and field:
> > >
> > >     <!-- spellcheck field is multivalued because it has the title and
> > > markup
> > >       fields copied into it -->
> > >     <field name="spellcheck" type="text_spell" stored="false"
> > > omitTermFreqAndPositions="true" multiValued="true"/>
> > >
> > > values from a markup and title field are copied into the spellcheck
> > field.
> > >
> > > My /select search component has the following defaults:
> > >
> > >     <lst name="defaults">
> > >       <str name="echoParams">explicit</str>
> > >       <int name="rows">10</int>
> > >       <str name="df">markup_texts title_texts</str>
> > >
> > >       <!-- Spell checking defaults -->
> > >       <str name="spellcheck">true</str>
> > >       <str name="spellcheck.collateExtendedResults">true</str>
> > >       <str name="spellcheck.extendedResults">true</str>
> > >       <str name="spellcheck.maxCollations">2</str>
> > >       <str name="spellcheck.maxCollationTries">5</str>
> > >       <str name="spellcheck.count">5</str>
> > >       <str name="spellcheck.collate">true</str>
> > >
> > >       <str name="spellcheck.maxResultsForSuggest">5</str>
> > >       <str name="spellcheck.alternativeTermCount">5</str>
> > >
> > >      </lst>
> > >
> > >
> > > When I issue a search like this:
> > >
> > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&spellcheck.q=markup_texts:(Perfrm%20HVC)&q=Perfrm%20HVC&rows=0
> > >
> > > I get collations:
> > >
> > > <lst name="collation">
> > > <str name="collationQuery">markup_texts:(perform hvac)</str>
> > > <int name="hits">4</int>
> > > <lst name="misspellingsAndCorrections">
> > > <str name="perfrm">perform</str>
> > > <str name="hvc">hvac</str>
> > > </lst>
> > > </lst>
> > > <lst name="collation">
> > > <str name="collationQuery">markup_texts:(performed hvac)</str>
> > > <int name="hits">4</int>
> > > <lst name="misspellingsAndCorrections">
> > > <str name="perfrm">performed</str>
> > > <str name="hvc">hvac</str>
> > > </lst>
> > > </lst>
> > >
> > > However, if I remove the spellcheck.q parameter I do not, i.e. no
> > > collations are returned for the following:
> > >
> > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
> > >
> > >
> > >
> > > If I specify the fields being searched over for the q parameter I get
> > > collations:
> > >
> > >
> > >
> >
> http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)&rows=0
> > >
> > > <lst name="collation">
> > > <str name="collationQuery">markup_texts:(perform hvac)</str>
> > > <int name="hits">4</int>
> > > <lst name="misspellingsAndCorrections">
> > > <str name="perfrm">perform</str>
> > > <str name="hvc">hvac</str>
> > > </lst>
> > > </lst>
> > > <lst name="collation">
> > > <str name="collationQuery">markup_texts:(performed hvac)</str>
> > > <int name="hits">4</int>
> > > <lst name="misspellingsAndCorrections">
> > > <str name="perfrm">performed</str>
> > > <str name="hvc">hvac</str>
> > > </lst>
> > > </lst>
> > >
> > >
> > > I'm a bit confused as to what the value for field should be in
> spellcheck
> > > component definition. In fact what is it's purpose here, just as the
> > input
> > > for building the spellchecking index? If that is so then why do I need
> to
> > > even specify the queryAnalyzerFieldType?
> > >
> > > Also, why do I need to explicitly specify the field in the query or
> > > spellcheck.q to get collations?
> > >
> > > Thanks and sorry for the rather long question.
> > >
> > > Brendan
> > >
> >
> >
> >
> > --
> > Brendan Grainger
> > www.kuripai.com
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>



-- 
Brendan Grainger
www.kuripai.com

Re: Spellcheck field element and collation issues

Reply via email to