: Maybe what I really need is a query parser that does not do disjunction
: maximum at all, but somehow still combines different 'qf' type fields with
: different boosts on each field. I personally don't _neccesarily_ need the
: actual disjunction max calculation, but I do need combining of
Yeah, I see your points. It's complicated. I'm not sure either.
But the thing is:
in order to use a feature like that you'd have to really think hard
about
the query analysis of your fields, and which ones will produce which
tokens in which situations
You need to think really hard about
: It seems like the problem is when different fields in the 'qf' produce a
: different number of tokens for a given query. dismax needs to know the number
: of tokens in the input in order to calculate 'mm', when 'mm' is expressed as a
: percentage, or when different mm's are given for different
Thanks, that's helpful.
It still seems like current behavior does the wrong thing in _many_ cases (I
know a lot of people get tripped up by it, sometimes on this list) -- but I
understand your cases where it does the right thing, and where what I'm
suggesting would be the wrong thing.
: not other) setups/intentions. It's counter-intuitive to me that adding
: a field to the 'qf' set results in _fewer_ hits than the same 'qf' set
agreed .. but that's where looking the debug info comes in to understand
the reason for that behavior is that your old qf treated part of your
Okay, I figured this one out -- I'm participating in a thread with
myself here, but for benefit of posterity, or if anyone's interested,
it's kind of interesting.
It's actually a variation of the known issue with dismax, mm, and fields
with varying stopwords. Actually a pretty tricky problem
Jonathan:
Thanks for writing that up, you're right, it is arcane
I've starred this one!
Erick
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/
So to understand, first familiarize
Thanks. I'm trying to think through if there's any hypothetical way for
dismax to be improved to not be subject to this problem. Now that it's
clear that the problem isn't just with stopwords, and that in fact it's
very hard to predict if you'll get the problem and under what input,
when
Okay, let's try the debug trace again without a pf to be less confusing.
One field in qf, that's ordinary text tokenized, and does get hits:
q=churchill%20%3A%20rooseveltqt=searchqf=title1_tmm=100%debugQuery=truepf=
str name=rawquerystringchurchill : roosevelt/str
str name=querystringchurchill