To avoid the users only see the first page problem, one solution is: if
the result set has more than one page with high scores near each other,
scramble them.
That is, if the top 20 results range in score from 19.0 to 20.0, they really
are all about the same relevance, so just card-shuffle them.
It may not be as fine-grained as you want, but also check the
QueryElevationComponent. This takes a preconfigured list of what the
top results should be for a given query and makes thoes documents the
top results.
Presumably, you could use click logs to determine what the top result
I've thought about patching the QueryElevationComponent to apply
boosts rather than a specific sort. Then the file might look like..
query text=AAA doc id=A boost=5 / doc id=B boost=4 / /
query
And I could write a script that looks at click data once a day to fill
out this file.
Thanks for
yes, applying a boost would be a good addition.
patches are always welcome ;)
On Jan 30, 2009, at 10:56 AM, Matthew Runo wrote:
I've thought about patching the QueryElevationComponent to apply
boosts rather than a specific sort. Then the file might look like..
query text=AAA doc id=A
Thanks, I didn't know there was so much research in this area.
Most of the papers at those workshops are about tuning the
entire ranking algorithm with machine learning techniques.
I am interested in adding one more feature, click data, to an
existing ranking algorithm. In my case, I have enough
A Decision Theoretic Framework for Ranking using Implicit Feedback
uses clicks, but the best part of that paper is all the side comments
about difficulties in evaluation. For example, if someone clicks on
three results, is that three times as good or two failures and a
success? We have to know the
Hello folks!
We've been thinking about ways to improve organic search results for a
while (really, who hasn't?) and I'd like to get some ideas on ways to
implement a feedback system that uses user behavior as input.
Basically, it'd work on the premise that what the user actually
clicked
I've been thinking about the same thing. We have a set of queries
that defy straightforward linguistics and ranking, like figuring
out how to match charlie brown to It's the Great Pumpkin,
Charlie Brown in October and to A Charlie Brown Christmas
in December.
I don't have any solutions yet, but I
OK I've implemented this before, written academic papers and patents
related to this task.
Here are some hints:
- you're on the right track with the editorial boosting elevators
- http://wiki.apache.org/solr/UserTagDesign
- be darn careful about assuming that one click is enough evidence