Many thanks for your explanation. That really helped me a lot in understanding DisMax - and finally I realized that DisMax is not at all what I need. Actually I do not want results where "blue" is in one field and "tooth" in another (imagine you search for a notebook with blue tooth and get some blue products that accidentally have tooth in some field).

My feeling already was that I have to come up with my own solution mixing parts of DisMax (distribute the query among the fields) and FieldQParserPlugin. So now I will try that out.

Many thanks
Tobi

Chris Hostetter schrieb:
: My original assumption for the DisMax Handler was, that it will just take the
: original query string and pass it to every field in its fieldlist using the
: fields configured analyzer stack. Maybe in the end add some stuff for the
: special options and so ... and then send the query to lucene. Can you explain
: why this approach was not choosen?

because then it wouldn't be the DisMaxRequestHandler.

seriously: the point of dismax is to build up a DisjunctionMaxQuery for each "chunk" in the query string and populate those DisjunctionMaxQueries with the Queries produced by analyzing that "chunk" against each field in the qf -- then all of the DisjunctionMaxQueries are grouped into a BooleanQuery with a minNrSHouldMatch.

if you look at the query toString from debugQuery (using a non trivial qf param and a q string containing more then one "chunk") you can see what i mean. your example shows it pretty well actaully...

: > : > : > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)

the point is to build those DisjunctionMaxQueries -- so that each "chunk" only contributes significantly based on the highest scoring field that chunk appears in ... if your example someone typing "blue tooth" can get a match when a doc matches blue in one field and tooth in another -- that wouldn't be possible with the appraoch you describe. the Query structure also means that a doc where "tooth" appears in both the category and name fields but "blue" doesn't appear at all won't score as high as a doc that matches "blue" in category and "tooth" in name (allthough you have to look at the score explanations to really see hwat i mean by that)


There are certainly a lot of improvements that could be made to dismax ... more customiation in terms of how the querystrings is parsed before building up the DisjunctionMaxQueries and calling the individual field analyzers would certainly be one way it could improve ... but so far no one has attempted anything like that.




-Hoss


Reply via email to