Reviving this thread. You say: > I do wonder...what if (e)dismax had a flag you could set that would tell it > that if any analyzers removed a term, then that term would become optional > for any fields for which it remained? I'm not sure what the development > effort would perhaps it would be a nice way to circumvent this problem in a > future release...
I created a JIRA issue to investigate if it is possible to implement this. See https://issues.apache.org/jira/browse/SOLR-3085 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 13. jan. 2011, at 17:36, Dyer, James wrote: > I appreciate the reply and blog posting. For now, I just enabled stopwords > for all the fields on "Qf". We have a very short list anyhow and our legacy > search engine didn't even allow field-by-field configuration (stopwords are > global on that system). > > I do wonder...what if (e)dismax had a flag you could set that would tell it > that if any analyzers removed a term, then that term would become optional > for any fields for which it remained? I'm not sure what the development > effort would perhaps it would be a nice way to circumvent this problem in a > future release... > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Thursday, January 13, 2011 9:54 AM > To: solr-user@lucene.apache.org; markus.jel...@openindex.io > Cc: Dyer, James > Subject: Re: StopFilterFactory and "qf" containing some fields that use it > and some that do not > > It's a known 'issue' in dismax, (really an inherent part of dismax's > design with no clear way to do anything about it), that qf over fields > with different stop word definitions will produce odd results for a > query with a stopword. > > Here's my understanding of what's going on: > http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ > > On 1/12/2011 6:48 PM, Markus Jelsma wrote: >> Here's another thread on the subject: >> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug- >> td493483.html >> >> And slightly off topic: you'd also might want to look at using common grams, >> they are really useful for phrase queries that contain stopwords. >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory >> >> >>> Here is what debug says each of these queries parse to: >>> >>> 1. q=life&defType=edismax&qf=Title ... returns 277,635 results >>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results >>> 3. q=life&defType=edismax&qf=Title Contributor ... returns 277,635 >>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results >>> >>> 1. +DisjunctionMaxQuery((Title:life)) >>> 2. +((DisjunctionMaxQuery((Title:life)))~1) >>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life)) >>> 4. +((DisjunctionMaxQuery((Contributor:the)) >>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2) >>> >>> I see what's going on here. Because "the" is a stop word for Title, it >>> gets removed from first part of the expression. This means that >>> "Contributor" is required to contain "the". dismax does the same thing >>> too. I guess I should have run debug before asking the mail list! >>> >>> It looks like the only workarounds I have is to either filter out the >>> stopwords in the client when this happens, or enable stop words for all >>> the fields that are used in "qf" with stopword-enabled fields. >>> Unless...someone has a better idea?? >>> >>> James Dyer >>> E-Commerce Systems >>> Ingram Content Group >>> (615) 213-4311 >>> >>> -----Original Message----- >>> From: Markus Jelsma [mailto:markus.jel...@openindex.io] >>> Sent: Wednesday, January 12, 2011 4:44 PM >>> To: solr-user@lucene.apache.org >>> Cc: Jayendra Patil >>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it >>> and some that do not >>> >>>> Have used edismax and Stopword filters as well. But usually use the fq >>>> parameter e.g. fq=title:the life and never had any issues. >>> That is because filter queries are not relevant for the mm parameter which >>> is being used for the main query. >>> >>>> Can you turn on the debugQuery and check whats the Query formed for all >>>> the combinations you mentioned. >>>> >>>> Regards, >>>> Jayendra >>>> >>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James >>> <james.d...@ingrambook.com>wrote: >>>>> I'm running into a problem with StopFilterFactory in conjunction with >>>>> (e)dismax queries that have a mix of fields, only some of which use >>>>> StopFilterFactory. It seems that if even 1 field on the "qf" parameter >>>>> does not use StopFilterFactory, then stop words are not removed when >>>>> searching any fields. Here's an example of what I mean: >>>>> >>>>> - I have 2 fields indexed: >>>>>> Title is "textStemmed", which includes StopFilterFactory (see >>>>>> below). Contributor is "textSimple", which does not include >>>>>> StopFilterFactory >>>>> >>>>> (see below). >>>>> - "The" is a stop word in stopwords.txt >>>>> - q=life&defType=edismax&qf=Title ... returns 277,635 results >>>>> - q=the life&defType=edismax&qf=Title ... returns 277,635 results >>>>> - q=life&defType=edismax&qf=Title Contributor ... returns 277,635 >>>>> results - q=the life&defType=edismax&qf=Title Contributor ... returns 0 >>>>> results >>>>> >>>>> It seems as if the stop words are not being stripped from the query >>>>> because "qf" contains a field that doesn't use StopFilterFactory. I >>>>> did testing with combining Stemmed fields with not Stemmed fields in >>>>> "qf" and it seems as if stemming gets applied regardless. But stop >>>>> words do not. >>>>> >>>>> Does anyone have ideas on what is going on? Is this a feature or >>>>> possibly a bug? Any known workarounds? Any advice is appreciated. >>>>> >>>>> James Dyer >>>>> E-Commerce Systems >>>>> Ingram Content Group >>>>> (615) 213-4311 >>>>> ________________________________ >>>>> <fieldType name="textSimple" class="solr.TextField" >>>>> positionIncrementGap="100"> >>>>> <analyzer type="index"> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>> </analyzer> >>>>> <analyzer type="query"> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>> </analyzer> >>>>> </fieldType> >>>>> >>>>> <fieldType name="textStemmed" class="solr.TextField" >>>>> positionIncrementGap="100"> >>>>> <analyzer type="index"> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>> words="stopwords.txt" enablePositionIncrements="true" /> >>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0" >>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" >>>>> stemEnglishPossessive="1" /> >>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>> <filter class="solr.PorterStemFilterFactory"/> >>>>> </analyzer> >>>>> <analyzer type="query"> >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >>>>> ignoreCase="true" expand="true"/> >>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>> words="stopwords.txt" enablePositionIncrements="true" /> >>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0" >>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" >>>>> stemEnglishPossessive="1" /> >>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>> <filter class="solr.PorterStemFilterFactory"/> >>>>> </analyzer> >>>>> </fieldType>