Jonathan has brought it to my attention that BOTH of my failing searches happen 
to have 8 terms, and one of the terms is repeated:

 "The Beatles as musicians : Revolver through the Anthology"
 "Color-blindness [print/digital]; its dangers and its detection"

but this is a PHRASE search.  

In case it's relevant, both Solr 1.4 and Solr 3.5:
 do NOT use stopwords in the fieldtype;  
 mm is  6<-1 6<90%  for dismax
 qs is 1
 ps is 3

And both use this filter last

<filter class="solr.RemoveDuplicatesTokenFilterFactory" />

… but I believe that filter is only used for consecutive tokens.

Lastly, 

 "Color-blindness [print/digital]; its and its detection"   works   ("danger" 
is removed, rather than one of the repeated "its")

- Naomi



On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

> So I don't really know what I'm talking about, and I'm not really sure if 
> it's related or not, but your particular query:
> 
> "The Beatles as musicians : Revolver through the Anthology"
> 
> With the lone "word" that's a ':', reminds me of a dismax stopwords-type 
> problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be 
> different on 1.4 and 3.x. And I see you aren't even using a multi-field 
> dismax in your sample query, so it couldn't possibly be what I ran into... I 
> don't think. But I'll write this anyway in case it gives someone some ideas.
> 
> The problem I ran into is caused by different analysis in two fields both 
> used in a dismax, one that ends up keeping ":" as a token, and one that 
> doesn't.  Which ends up having the same effect as the famous 'dismax 
> stopwords problem'.
> 
> Maybe somehow your schema changed such to produce this problem in 3.x but not 
> in 1.4? Although again I realize the fact that you are only using a single 
> field in your demo dismax query kind of suggests it's not this problem. 
> Wonder if you try the query without the ":", if the problem goes away, that 
> might be a hint. Or, maybe someone more skilled at understanding what's in 
> those Solr debug statements than I am (it's kind of all greek to me) will be 
> able to take this hint and rule out or confirm that it may have something to 
> do with your problem.
> 
> Here I write up the issue I ran into (which may or may not have anything to 
> do with what you ran into)
> 
> http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
> 
> 
> Also, you don't say what your 'mm' is in your dismax queries, that could be 
> relevant if it's got anything to do with anything similar to the issue I'm 
> talking about.
> 
> Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens 
> for 'mm' in such a way that the 'varying field analysis dismax gotcha' can 
> manifest with only one field, if the way dismax counts tokens for 'mm' 
> differs from number of tokens the single field's analysis produces?
> 
> Jonathan
> 
> On 2/22/2012 2:55 PM, Naomi Dushay wrote:
>> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   
>> I have a test checking for a search result in Solr, and the test passes in 
>> Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I 
>> just included output from lucene QueryParser to prove the document exists 
>> and is found
>> 
>> I am completely stumped.
>> 
>> 
>> Here are the debugQuery details:
>> 
>> ***Solr 3.5***
>> 
>> lucene QueryParser:
>> 
>> URL:   q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>>   1.0 = queryWeight(all_search:"the beatl as musician revolv through the 
>> antholog"), product of:
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.02063975 = queryNorm
>>   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>>     1.0 = tf(phraseFreq=1.0)
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.125 = fieldNorm(field=all_search, doc=1064395)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
>> through the Anthology"
>> final query:   +(all_search:"the beatl as musician revolv through the 
>> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
>> antholog"~3)~0.01
>> 
>> (no matches)
>> 
>> 
>> ***Solr 1.4***
>> 
>> lucene QueryParser:
>> 
>> URL:  q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the 
>> antholog" in 3469163), product of:
>>   1.0 = tf(phraseFreq=1.0)
>>   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
>> revolv=820 through=88238 the=3542123 antholog=11205)
>>   0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
>> through the Anthology"
>> final query:  +(all_search:"the beatl as musician revolv through the 
>> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
>> antholog"~3)~0.01
>> 
>> score:
>> 
>> 7.449651 = (MATCH) sum of:
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the 
>> antholog"~1 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through 
>> the antholog"~1), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the 
>> antholog"~3 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through 
>> the antholog"~3), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> 
>> 

Reply via email to