Here's another thread on the subject: http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug- td493483.html
And slightly off topic: you'd also might want to look at using common grams, they are really useful for phrase queries that contain stopwords. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory > Here is what debug says each of these queries parse to: > > 1. q=life&defType=edismax&qf=Title ... returns 277,635 results > 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results > 3. q=life&defType=edismax&qf=Title Contributor ... returns 277,635 > 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results > > 1. +DisjunctionMaxQuery((Title:life)) > 2. +((DisjunctionMaxQuery((Title:life)))~1) > 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life)) > 4. +((DisjunctionMaxQuery((Contributor:the)) > DisjunctionMaxQuery((Contributor:life | Title:life)))~2) > > I see what's going on here. Because "the" is a stop word for Title, it > gets removed from first part of the expression. This means that > "Contributor" is required to contain "the". dismax does the same thing > too. I guess I should have run debug before asking the mail list! > > It looks like the only workarounds I have is to either filter out the > stopwords in the client when this happens, or enable stop words for all > the fields that are used in "qf" with stopword-enabled fields. > Unless...someone has a better idea?? > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -----Original Message----- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: Wednesday, January 12, 2011 4:44 PM > To: solr-user@lucene.apache.org > Cc: Jayendra Patil > Subject: Re: StopFilterFactory and "qf" containing some fields that use it > and some that do not > > > Have used edismax and Stopword filters as well. But usually use the fq > > parameter e.g. fq=title:the life and never had any issues. > > That is because filter queries are not relevant for the mm parameter which > is being used for the main query. > > > Can you turn on the debugQuery and check whats the Query formed for all > > the combinations you mentioned. > > > > Regards, > > Jayendra > > > > On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James > > <james.d...@ingrambook.com>wrote: > > > I'm running into a problem with StopFilterFactory in conjunction with > > > (e)dismax queries that have a mix of fields, only some of which use > > > StopFilterFactory. It seems that if even 1 field on the "qf" parameter > > > does not use StopFilterFactory, then stop words are not removed when > > > searching any fields. Here's an example of what I mean: > > > > > > - I have 2 fields indexed: > > > > Title is "textStemmed", which includes StopFilterFactory (see > > > > below). Contributor is "textSimple", which does not include > > > > StopFilterFactory > > > > > > (see below). > > > - "The" is a stop word in stopwords.txt > > > - q=life&defType=edismax&qf=Title ... returns 277,635 results > > > - q=the life&defType=edismax&qf=Title ... returns 277,635 results > > > - q=life&defType=edismax&qf=Title Contributor ... returns 277,635 > > > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0 > > > results > > > > > > It seems as if the stop words are not being stripped from the query > > > because "qf" contains a field that doesn't use StopFilterFactory. I > > > did testing with combining Stemmed fields with not Stemmed fields in > > > "qf" and it seems as if stemming gets applied regardless. But stop > > > words do not. > > > > > > Does anyone have ideas on what is going on? Is this a feature or > > > possibly a bug? Any known workarounds? Any advice is appreciated. > > > > > > James Dyer > > > E-Commerce Systems > > > Ingram Content Group > > > (615) 213-4311 > > > ________________________________ > > > <fieldType name="textSimple" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > </analyzer> > > > <analyzer type="query"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > </analyzer> > > > </fieldType> > > > > > > <fieldType name="textStemmed" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="stopwords.txt" enablePositionIncrements="true" /> > > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > > > generateNumberParts="0" catenateWords="0" catenateNumbers="0" > > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > > > stemEnglishPossessive="1" /> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.PorterStemFilterFactory"/> > > > </analyzer> > > > <analyzer type="query"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > > ignoreCase="true" expand="true"/> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="stopwords.txt" enablePositionIncrements="true" /> > > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > > > generateNumberParts="0" catenateWords="0" catenateNumbers="0" > > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > > > stemEnglishPossessive="1" /> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.PorterStemFilterFactory"/> > > > </analyzer> > > > </fieldType>