Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Markus Jelsma Wed, 12 Jan 2011 15:49:14 -0800

Here's another thread on the subject:
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
td493483.html


And slightly off topic: you'd also might want to look at using common grams, 
they are really useful for phrase queries that contain stopwords.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory


> Here is what debug says each of these queries parse to:
> 
> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
> 
> 1. +DisjunctionMaxQuery((Title:life))
> 2. +((DisjunctionMaxQuery((Title:life)))~1)
> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
> 4. +((DisjunctionMaxQuery((Contributor:the))
> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
> 
> I see what's going on here.  Because "the" is a stop word for Title, it
> gets removed from first part of the expression.  This means that
> "Contributor" is required to contain "the".  dismax does the same thing
> too.  I guess I should have run debug before asking the mail list!
> 
> It looks like the only workarounds I have is to either filter out the
> stopwords in the client when this happens, or enable stop words for all
> the fields that are used in "qf" with stopword-enabled fields. 
> Unless...someone has a better idea??
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Sent: Wednesday, January 12, 2011 4:44 PM
> To: solr-user@lucene.apache.org
> Cc: Jayendra Patil
> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
> and some that do not
> 
> > Have used edismax and Stopword filters as well. But usually use the fq
> > parameter e.g. fq=title:the life and never had any issues.
> 
> That is because filter queries are not relevant for the mm parameter which
> is being used for the main query.
> 
> > Can you turn on the debugQuery and check whats the Query formed for all
> > the combinations you mentioned.
> > 
> > Regards,
> > Jayendra
> > 
> > On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
> 
> <james.d...@ingrambook.com>wrote:
> > > I'm running into a problem with StopFilterFactory in conjunction with
> > > (e)dismax queries that have a mix of fields, only some of which use
> > > StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> > > does not use StopFilterFactory, then stop words are not removed when
> > > searching any fields.  Here's an example of what I mean:
> > > 
> > > - I have 2 fields indexed:
> > >  > Title is "textStemmed", which includes StopFilterFactory (see
> > >  > below). Contributor is "textSimple", which does not include
> > >  > StopFilterFactory
> > > 
> > > (see below).
> > > - "The" is a stop word in stopwords.txt
> > > - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> > > - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> > > - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> > > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
> > > results
> > > 
> > > It seems as if the stop words are not being stripped from the query
> > > because "qf" contains a field that doesn't use StopFilterFactory.  I
> > > did testing with combining Stemmed fields with not Stemmed fields in
> > > "qf" and it seems as if stemming gets applied regardless.  But stop
> > > words do not.
> > > 
> > > Does anyone have ideas on what is going on?  Is this a feature or
> > > possibly a bug?  Any known workarounds?  Any advice is appreciated.
> > > 
> > > James Dyer
> > > E-Commerce Systems
> > > Ingram Content Group
> > > (615) 213-4311
> > > ________________________________
> > > <fieldType name="textSimple" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer type="index">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > > <analyzer type="query">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > > </fieldType>
> > > 
> > > <fieldType name="textStemmed" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer type="index">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" enablePositionIncrements="true" />
> > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > > stemEnglishPossessive="1" />
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <filter class="solr.PorterStemFilterFactory"/>
> > > </analyzer>
> > > <analyzer type="query">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > > ignoreCase="true" expand="true"/>
> > > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" enablePositionIncrements="true" />
> > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > > stemEnglishPossessive="1" />
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <filter class="solr.PorterStemFilterFactory"/>
> > > </analyzer>
> > > </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Reply via email to