Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-07-05 Thread Chris Hostetter
: Maybe what I really need is a query parser that does not do disjunction : maximum at all, but somehow still combines different 'qf' type fields with : different boosts on each field. I personally don't _neccesarily_ need the : actual disjunction max calculation, but I do need combining of

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-22 Thread Jonathan Rochkind
Yeah, I see your points. It's complicated. I'm not sure either. But the thing is: in order to use a feature like that you'd have to really think hard about the query analysis of your fields, and which ones will produce which tokens in which situations You need to think really hard about

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-21 Thread Chris Hostetter
: It seems like the problem is when different fields in the 'qf' produce a : different number of tokens for a given query. dismax needs to know the number : of tokens in the input in order to calculate 'mm', when 'mm' is expressed as a : percentage, or when different mm's are given for different

RE: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-21 Thread Jonathan Rochkind
Thanks, that's helpful. It still seems like current behavior does the wrong thing in _many_ cases (I know a lot of people get tripped up by it, sometimes on this list) -- but I understand your cases where it does the right thing, and where what I'm suggesting would be the wrong thing.

RE: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-21 Thread Chris Hostetter
: not other) setups/intentions. It's counter-intuitive to me that adding : a field to the 'qf' set results in _fewer_ hits than the same 'qf' set agreed .. but that's where looking the debug info comes in to understand the reason for that behavior is that your old qf treated part of your

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-15 Thread Jonathan Rochkind
Okay, I figured this one out -- I'm participating in a thread with myself here, but for benefit of posterity, or if anyone's interested, it's kind of interesting. It's actually a variation of the known issue with dismax, mm, and fields with varying stopwords. Actually a pretty tricky problem

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-15 Thread Erick Erickson
Jonathan: Thanks for writing that up, you're right, it is arcane I've starred this one! Erick http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ So to understand, first familiarize

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-15 Thread Jonathan Rochkind
Thanks. I'm trying to think through if there's any hypothetical way for dismax to be improved to not be subject to this problem. Now that it's clear that the problem isn't just with stopwords, and that in fact it's very hard to predict if you'll get the problem and under what input, when

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-14 Thread Jonathan Rochkind
Okay, let's try the debug trace again without a pf to be less confusing. One field in qf, that's ordinary text tokenized, and does get hits: q=churchill%20%3A%20rooseveltqt=searchqf=title1_tmm=100%debugQuery=truepf= str name=rawquerystringchurchill : roosevelt/str str name=querystringchurchill