On 4/10/2017 8:59 AM, David Kramer wrote:
> I’ve done quite a bit of searching on this. Pretty much every page I
> find says it’s a bad idea and won’t work well, but I’ve been asked to
> at least try it to reduce the number of completely unrelated results
> returned. We are not trying to normalize the number, or display it as
> a percentage, and I understand why those are not mathematically sound.
> We are relying on Solr for pagination, so we can’t just filter out low
> scores from the results. 

Here's my contribution.  This boils down to nearly the same thing Erick
said, but stated in a very different way: The absolute score value has
zero meaning, for ANY purpose ... not just percentages or
normalization.  If you try to use it, you're asking for disappointment.

Scores only have meaning within a single query, and the only information
that's important is whether the score of one document is higher or lower
than the score of the rest of the documents in the same result. 
Boosting lets you influence those relative scores, but the actual
numeric score of one document in a result doesn't reveal ANYTHING useful
about that document.

I agree with Erick's general advice:  Instead of trying to arbitrarily
decide which documents are scoring too low to be relevant, refine the
query so that irrelevant results are either completely excluded, or so
relevant documents will outscore irrelevant ones and the first few pages
will be good results.  Users must be trained to expect irrelevant (and
slow) results if they paginate deeply.  For performance reasons, you
should limit how many pages users can view on a result.

Thanks,
Shawn

Reply via email to