On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote: > I would like to be able to prune my search result by removing the less > relevant documents. I'm thinking about using the search score : I use > the search scores of the document set (I assume there are sorted by > descending order), normalise them (0 would be the the lowest value and 1 > the greatest value) and then calculate the gradient of the normalised > scores. The documents with a gradient below a threshold value would be > rejected.
As part of experimenting with federated search, this is one approach we'll be trying out to determine which results to discard when merging. > If the scores are linearly decreasing, then no document is rejected. > However, if there is a brutal score drop, then the documents below the > drop are rejected. So if we have the scores 1.0, 0.9, 0.2, 0.15, 0.1, 0.05 then the slopes will be 0.05, 0.4, 0.025, 0.025, 0.025 and with a slope threshold of 0.1, we would discard everything from score 0.2 and below. It makes sense if the scores are linear with the relevance (a document with score 0.8 has double the relevance as one with 0.4). I don't know if they are, so experiments must be made and I fear that this is another demonstration of the inherent problem with quantifying quality. - Toke