Reviving this thread.

You say:
> I do wonder...what if (e)dismax had a flag you could set that would tell it 
> that if any analyzers removed a term, then that term would become optional 
> for any fields for which it remained?  I'm not sure what the development 
> effort would perhaps it would be a nice way to circumvent this problem in a 
> future release...

I created a JIRA issue to investigate if it is possible to implement this. See 
https://issues.apache.org/jira/browse/SOLR-3085

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. jan. 2011, at 17:36, Dyer, James wrote:

> I appreciate the reply and blog posting.  For now, I just enabled stopwords 
> for all the fields on "Qf".  We have a very short list anyhow and our legacy 
> search engine didn't even allow field-by-field configuration (stopwords are 
> global on that system).
> 
> I do wonder...what if (e)dismax had a flag you could set that would tell it 
> that if any analyzers removed a term, then that term would become optional 
> for any fields for which it remained?  I'm not sure what the development 
> effort would perhaps it would be a nice way to circumvent this problem in a 
> future release...
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
> Sent: Thursday, January 13, 2011 9:54 AM
> To: solr-user@lucene.apache.org; markus.jel...@openindex.io
> Cc: Dyer, James
> Subject: Re: StopFilterFactory and "qf" containing some fields that use it 
> and some that do not
> 
> It's a known 'issue' in dismax, (really an inherent part of dismax's 
> design with no clear way to do anything about it), that qf over fields 
> with different stop word definitions will produce odd results for a 
> query with a stopword.
> 
> Here's my understanding of what's going on: 
> http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/
> 
> On 1/12/2011 6:48 PM, Markus Jelsma wrote:
>> Here's another thread on the subject:
>> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
>> td493483.html
>> 
>> And slightly off topic: you'd also might want to look at using common grams,
>> they are really useful for phrase queries that contain stopwords.
>> 
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
>> 
>> 
>>> Here is what debug says each of these queries parse to:
>>> 
>>> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>>> 
>>> 1. +DisjunctionMaxQuery((Title:life))
>>> 2. +((DisjunctionMaxQuery((Title:life)))~1)
>>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
>>> 4. +((DisjunctionMaxQuery((Contributor:the))
>>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
>>> 
>>> I see what's going on here.  Because "the" is a stop word for Title, it
>>> gets removed from first part of the expression.  This means that
>>> "Contributor" is required to contain "the".  dismax does the same thing
>>> too.  I guess I should have run debug before asking the mail list!
>>> 
>>> It looks like the only workarounds I have is to either filter out the
>>> stopwords in the client when this happens, or enable stop words for all
>>> the fields that are used in "qf" with stopword-enabled fields.
>>> Unless...someone has a better idea??
>>> 
>>> James Dyer
>>> E-Commerce Systems
>>> Ingram Content Group
>>> (615) 213-4311
>>> 
>>> -----Original Message-----
>>> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
>>> Sent: Wednesday, January 12, 2011 4:44 PM
>>> To: solr-user@lucene.apache.org
>>> Cc: Jayendra Patil
>>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
>>> and some that do not
>>> 
>>>> Have used edismax and Stopword filters as well. But usually use the fq
>>>> parameter e.g. fq=title:the life and never had any issues.
>>> That is because filter queries are not relevant for the mm parameter which
>>> is being used for the main query.
>>> 
>>>> Can you turn on the debugQuery and check whats the Query formed for all
>>>> the combinations you mentioned.
>>>> 
>>>> Regards,
>>>> Jayendra
>>>> 
>>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
>>> <james.d...@ingrambook.com>wrote:
>>>>> I'm running into a problem with StopFilterFactory in conjunction with
>>>>> (e)dismax queries that have a mix of fields, only some of which use
>>>>> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
>>>>> does not use StopFilterFactory, then stop words are not removed when
>>>>> searching any fields.  Here's an example of what I mean:
>>>>> 
>>>>> - I have 2 fields indexed:
>>>>>> Title is "textStemmed", which includes StopFilterFactory (see
>>>>>> below). Contributor is "textSimple", which does not include
>>>>>> StopFilterFactory
>>>>> 
>>>>> (see below).
>>>>> - "The" is a stop word in stopwords.txt
>>>>> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>>>> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>>>> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>>>> results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
>>>>> results
>>>>> 
>>>>> It seems as if the stop words are not being stripped from the query
>>>>> because "qf" contains a field that doesn't use StopFilterFactory.  I
>>>>> did testing with combining Stemmed fields with not Stemmed fields in
>>>>> "qf" and it seems as if stemming gets applied regardless.  But stop
>>>>> words do not.
>>>>> 
>>>>> Does anyone have ideas on what is going on?  Is this a feature or
>>>>> possibly a bug?  Any known workarounds?  Any advice is appreciated.
>>>>> 
>>>>> James Dyer
>>>>> E-Commerce Systems
>>>>> Ingram Content Group
>>>>> (615) 213-4311
>>>>> ________________________________
>>>>> <fieldType name="textSimple" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>> <analyzer type="index">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> </analyzer>
>>>>> <analyzer type="query">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> </analyzer>
>>>>> </fieldType>
>>>>> 
>>>>> <fieldType name="textStemmed" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>> <analyzer type="index">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>>> stemEnglishPossessive="1" />
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>>> </analyzer>
>>>>> <analyzer type="query">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>>>> ignoreCase="true" expand="true"/>
>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>>> stemEnglishPossessive="1" />
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>>> </analyzer>
>>>>> </fieldType>

Reply via email to