Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated:
"The Beatles as musicians : Revolver through the Anthology" "Color-blindness [print/digital]; its dangers and its detection" but this is a PHRASE search. In case it's relevant, both Solr 1.4 and Solr 3.5: do NOT use stopwords in the fieldtype; mm is 6<-1 6<90% for dismax qs is 1 ps is 3 And both use this filter last <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> … but I believe that filter is only used for consecutive tokens. Lastly, "Color-blindness [print/digital]; its and its detection" works ("danger" is removed, rather than one of the repeated "its") - Naomi On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote: > So I don't really know what I'm talking about, and I'm not really sure if > it's related or not, but your particular query: > > "The Beatles as musicians : Revolver through the Anthology" > > With the lone "word" that's a ':', reminds me of a dismax stopwords-type > problem I ran into. Now, I ran into it on 1.4. I don't know why it would be > different on 1.4 and 3.x. And I see you aren't even using a multi-field > dismax in your sample query, so it couldn't possibly be what I ran into... I > don't think. But I'll write this anyway in case it gives someone some ideas. > > The problem I ran into is caused by different analysis in two fields both > used in a dismax, one that ends up keeping ":" as a token, and one that > doesn't. Which ends up having the same effect as the famous 'dismax > stopwords problem'. > > Maybe somehow your schema changed such to produce this problem in 3.x but not > in 1.4? Although again I realize the fact that you are only using a single > field in your demo dismax query kind of suggests it's not this problem. > Wonder if you try the query without the ":", if the problem goes away, that > might be a hint. Or, maybe someone more skilled at understanding what's in > those Solr debug statements than I am (it's kind of all greek to me) will be > able to take this hint and rule out or confirm that it may have something to > do with your problem. > > Here I write up the issue I ran into (which may or may not have anything to > do with what you ran into) > > http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/ > > > Also, you don't say what your 'mm' is in your dismax queries, that could be > relevant if it's got anything to do with anything similar to the issue I'm > talking about. > > Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens > for 'mm' in such a way that the 'varying field analysis dismax gotcha' can > manifest with only one field, if the way dismax counts tokens for 'mm' > differs from number of tokens the single field's analysis produces? > > Jonathan > > On 2/22/2012 2:55 PM, Naomi Dushay wrote: >> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem. >> I have a test checking for a search result in Solr, and the test passes in >> Solr 1.4, but fails in Solr 3.5. Dismax is the desired QueryParser -- I >> just included output from lucene QueryParser to prove the document exists >> and is found >> >> I am completely stumped. >> >> >> Here are the debugQuery details: >> >> ***Solr 3.5*** >> >> lucene QueryParser: >> >> URL: q=all_search:"The Beatles as musicians : Revolver through the >> Anthology" >> final query: all_search:"the beatl as musician revolv through the antholog" >> >> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through >> the antholog" in 1064395), product of: >> 1.0 = queryWeight(all_search:"the beatl as musician revolv through the >> antholog"), product of: >> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 >> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) >> 0.02063975 = queryNorm >> 6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through >> the antholog" in 1064395), product of: >> 1.0 = tf(phraseFreq=1.0) >> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 >> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) >> 0.125 = fieldNorm(field=all_search, doc=1064395) >> >> dismax QueryParser: >> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver >> through the Anthology" >> final query: +(all_search:"the beatl as musician revolv through the >> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the >> antholog"~3)~0.01 >> >> (no matches) >> >> >> ***Solr 1.4*** >> >> lucene QueryParser: >> >> URL: q=all_search:"The Beatles as musicians : Revolver through the >> Anthology" >> final query: all_search:"the beatl as musician revolv through the antholog" >> >> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the >> antholog" in 3469163), product of: >> 1.0 = tf(phraseFreq=1.0) >> 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 >> revolv=820 through=88238 the=3542123 antholog=11205) >> 0.109375 = fieldNorm(field=all_search, doc=3469163) >> >> dismax QueryParser: >> URL: qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver >> through the Anthology" >> final query: +(all_search:"the beatl as musician revolv through the >> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the >> antholog"~3)~0.01 >> >> score: >> >> 7.449651 = (MATCH) sum of: >> 3.7248254 = weight(all_search:"the beatl as musician revolv through the >> antholog"~1 in 3469163), product of: >> 0.7071068 = queryWeight(all_search:"the beatl as musician revolv through >> the antholog"~1), product of: >> 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 >> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) >> 0.014681898 = queryNorm >> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through >> the antholog" in 3469163), product of: >> 1.0 = tf(phraseFreq=1.0) >> 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 >> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) >> 0.109375 = fieldNorm(field=all_search, doc=3469163) >> 3.7248254 = weight(all_search:"the beatl as musician revolv through the >> antholog"~3 in 3469163), product of: >> 0.7071068 = queryWeight(all_search:"the beatl as musician revolv through >> the antholog"~3), product of: >> 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 >> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) >> 0.014681898 = queryNorm >> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through >> the antholog" in 3469163), product of: >> 1.0 = tf(phraseFreq=1.0) >> 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 >> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) >> 0.109375 = fieldNorm(field=all_search, doc=3469163) >> >> >>