Many thanks for your explanation. That really helped me a
lot in understanding DisMax - and finally I realized that
DisMax is not at all what I need. Actually I do not want
results where "blue" is in one field and "tooth" in another
(imagine you search for a notebook with blue tooth and get
some blue products that accidentally have tooth in some field).
My feeling already was that I have to come up with my own
solution mixing parts of DisMax (distribute the query among
the fields) and FieldQParserPlugin. So now I will try that out.
Many thanks
Tobi
Chris Hostetter schrieb:
: My original assumption for the DisMax Handler was, that it will just take the
: original query string and pass it to every field in its fieldlist using the
: fields configured analyzer stack. Maybe in the end add some stuff for the
: special options and so ... and then send the query to lucene. Can you explain
: why this approach was not choosen?
because then it wouldn't be the DisMaxRequestHandler.
seriously: the point of dismax is to build up a DisjunctionMaxQuery for
each "chunk" in the query string and populate those DisjunctionMaxQueries
with the Queries produced by analyzing that "chunk" against each field in
the qf -- then all of the DisjunctionMaxQueries are grouped into a
BooleanQuery with a minNrSHouldMatch.
if you look at the query toString from debugQuery (using a non trivial qf
param and a q string containing more then one "chunk") you can see what i
mean. your example shows it pretty well actaully...
: > : > : > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)
the point is to build those DisjunctionMaxQueries -- so that each "chunk"
only contributes significantly based on the highest scoring field that
chunk appears in ... if your example someone typing "blue tooth" can get a
match when a doc matches blue in one field and tooth in another -- that
wouldn't be possible with the appraoch you describe. the Query structure
also means that a doc where "tooth" appears in both the category and name
fields but "blue" doesn't appear at all won't score as high as a doc that
matches "blue" in category and "tooth" in name (allthough you have to look
at the score explanations to really see hwat i mean by that)
There are certainly a lot of improvements that could be made to dismax ...
more customiation in terms of how the querystrings is parsed before
building up the DisjunctionMaxQueries and calling the individual field
analyzers would certainly be one way it could improve ... but so far no
one has attempted anything like that.
-Hoss