I wanted to share that I have been polishing a plugin that lets you control
the stats of your Solr instance. Primarily used in relevance evaluation to
mock production doc frequency, etc, this plugin lets you control the global
stats side of BM25 scoring, when working with smaller test samples of
documents.

However, there could be other use cases where you want to manually
control the "natural" document frequency in prod to match the true
specificity of a term. If "book" only comes up once in your titles for your
book search index, arguably, you might want that to actually be treated a
lot more common to match the user's true sense of that term's specificity,
and make book matches less important than, say, a match for a truly
specific term like "woodworking".

https://github.com/softwaredoug/managed-stats

Love to get feedback, PRs always welcome

Thanks
-Doug

Reply via email to