StopFilterFactory and "qf" containing some fields that use it and some that do not

Dyer, James Wed, 12 Jan 2011 14:18:34 -0800

I'm running into a problem with StopFilterFactory in conjunction with (e)dismax 
queries that have a mix of fields, only some of which use StopFilterFactory.  
It seems that if even 1 field on the "qf" parameter does not use 
StopFilterFactory, then stop words are not removed when searching any fields.  
Here's an example of what I mean:


- I have 2 fields indexed:
  > Title is "textStemmed", which includes StopFilterFactory (see below).
  > Contributor is "textSimple", which does not include StopFilterFactory (see 
below).
- "The" is a stop word in stopwords.txt
- q=life&defType=edismax&qf=Title  ... returns 277,635 results
- q=the life&defType=edismax&qf=Title ... returns 277,635 results
- q=life&defType=edismax&qf=Title Contributor  ... returns 277,635 results
- q=the life&defType=edismax&qf=Title Contributor ... returns 0 results

It seems as if the stop words are not being stripped from the query because 
"qf" contains a field that doesn't use StopFilterFactory.  I did testing with 
combining Stemmed fields with not Stemmed fields in "qf" and it seems as if 
stemming gets applied regardless.  But stop words do not.

Does anyone have ideas on what is going on?  Is this a feature or possibly a 
bug?  Any known workarounds?  Any advice is appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
________________________________
<fieldType name="textSimple" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="textStemmed" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

StopFilterFactory and "qf" containing some fields that use it and some that do not

Reply via email to