Jonathan,

I have the same problem without the colon - I tested that, but didn't mention 
it.   

mm can't be the issue either:   in Solr 3.5, if I remove one of the occurrences 
of "the"  (doesn't matter which), I get results.  Removing any other word does 
NOT get results.   And if the query isn't a phrase query, it gets results.

And no, it can't be related to what you refer to as the  "dismax stopwords 
problem", since i can demonstrate the problem with a single field.  mm can't be 
the issue 


I have run into problems in the past with a non-alpha character surrounded by 
spaces tanking my search results for dismax … but I fixed that with this 
fieldType:

    <!-- single token with punctuation terms removed so dismax doesn't look for 
punctuation terms in these fields -->
    <!-- On client side, Lucene query parser breaks things up by whitespace 
*before* field analysis for dismax -->
    <!-- so punctuation terms (& : ;) are stopwords to allow results from other 
fields when these chars are surrounded by spaces in query -->
    <!--  do not lowercase -->
    <fieldType name="string_punct_stop" class="solr.TextField" omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc" 
mode="compose" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc" 
mode="compose" />
        <!-- removing punctuation for Lucene query parser issues -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords_punctuation.txt" enablePositionIncrements="true" />
      </analyzer>
    </fieldType>

My stopwords_punctuation.txt file is

#Punctuation characters we want to ignore in queries
:
;
&
/

and used this type instead of string for fields in my dismax qf.    Thus, the 
punctuation "terms" in the query are not present for the fields that were 
formerly string fields.

- Naomi

On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

> So I don't really know what I'm talking about, and I'm not really sure if 
> it's related or not, but your particular query:
> 
> "The Beatles as musicians : Revolver through the Anthology"
> 
> With the lone "word" that's a ':', reminds me of a dismax stopwords-type 
> problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be 
> different on 1.4 and 3.x. And I see you aren't even using a multi-field 
> dismax in your sample query, so it couldn't possibly be what I ran into... I 
> don't think. But I'll write this anyway in case it gives someone some ideas.
> 
> The problem I ran into is caused by different analysis in two fields both 
> used in a dismax, one that ends up keeping ":" as a token, and one that 
> doesn't.  Which ends up having the same effect as the famous 'dismax 
> stopwords problem'.
> 
> Maybe somehow your schema changed such to produce this problem in 3.x but not 
> in 1.4? Although again I realize the fact that you are only using a single 
> field in your demo dismax query kind of suggests it's not this problem. 
> Wonder if you try the query without the ":", if the problem goes away, that 
> might be a hint. Or, maybe someone more skilled at understanding what's in 
> those Solr debug statements than I am (it's kind of all greek to me) will be 
> able to take this hint and rule out or confirm that it may have something to 
> do with your problem.
> 
> Here I write up the issue I ran into (which may or may not have anything to 
> do with what you ran into)
> 
> http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
> 
> 
> Also, you don't say what your 'mm' is in your dismax queries, that could be 
> relevant if it's got anything to do with anything similar to the issue I'm 
> talking about.
> 
> Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens 
> for 'mm' in such a way that the 'varying field analysis dismax gotcha' can 
> manifest with only one field, if the way dismax counts tokens for 'mm' 
> differs from number of tokens the single field's analysis produces?
> 
> Jonathan
> 
> On 2/22/2012 2:55 PM, Naomi Dushay wrote:
>> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   
>> I have a test checking for a search result in Solr, and the test passes in 
>> Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I 
>> just included output from lucene QueryParser to prove the document exists 
>> and is found
>> 
>> I am completely stumped.
>> 
>> 
>> Here are the debugQuery details:
>> 
>> ***Solr 3.5***
>> 
>> lucene QueryParser:
>> 
>> URL:   q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>>   1.0 = queryWeight(all_search:"the beatl as musician revolv through the 
>> antholog"), product of:
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.02063975 = queryNorm
>>   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>>     1.0 = tf(phraseFreq=1.0)
>>     48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>>     0.125 = fieldNorm(field=all_search, doc=1064395)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
>> through the Anthology"
>> final query:   +(all_search:"the beatl as musician revolv through the 
>> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
>> antholog"~3)~0.01
>> 
>> (no matches)
>> 
>> 
>> ***Solr 1.4***
>> 
>> lucene QueryParser:
>> 
>> URL:  q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the 
>> antholog" in 3469163), product of:
>>   1.0 = tf(phraseFreq=1.0)
>>   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
>> revolv=820 through=88238 the=3542123 antholog=11205)
>>   0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
>> through the Anthology"
>> final query:  +(all_search:"the beatl as musician revolv through the 
>> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
>> antholog"~3)~0.01
>> 
>> score:
>> 
>> 7.449651 = (MATCH) sum of:
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the 
>> antholog"~1 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through 
>> the antholog"~1), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>>   3.7248254 = weight(all_search:"the beatl as musician revolv through the 
>> antholog"~3 in 3469163), product of:
>>     0.7071068 = queryWeight(all_search:"the beatl as musician revolv through 
>> the antholog"~3), product of:
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.014681898 = queryNorm
>>     5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 3469163), product of:
>>       1.0 = tf(phraseFreq=1.0)
>>       48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
>> musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
>>       0.109375 = fieldNorm(field=all_search, doc=3469163)
>> 
>> 
>> 

Reply via email to