Re: Maximum score estimation

2024-05-22 Thread Adrien Grand
Hi Mikhail, You is correct, it should give an ok upper bound of scores on term queries and combinations of term queries via BooleanQuery. On Wed, May 22, 2024 at 6:57 PM Mikhail Khludnev wrote: > I'm trying to understand Impacts. Need help. >

Re: Maximum score estimation

2024-05-22 Thread Mikhail Khludnev
I'm trying to understand Impacts. Need help. https://github.com/apache/lucene/issues/5270#issuecomment-1223383919 Does it mean advanceShallow(0) getMaxScore(maxDoc-1) gives a good max score estem at least for a term query? On Fri, May 10, 2024 at 11:21 PM Mikhail Khludnev wrote: > Hello

Re: Maximum score estimation

2024-05-10 Thread Mikhail Khludnev
Hello Alessandro. Glad to hear! There's not much update from the previously published link: just a tiny test. Guessing max tf doesn't seem really reliable. However, I've got another idea: Can't Impacts give us an exact max score like

Re: Maximum score estimation

2024-05-09 Thread Alessandro Benedetti
Hi Mikhail, I was thinking again about this regarding Hybrid Search in Solr and the current https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#scale-function . Was there any progress on this? Any traction? Sooner or later I hope to get some funds to work on this, I keep

Re: Maximum score estimation

2023-02-13 Thread Mikhail Khludnev
Hello. Just FYI. I scratched a little prototype https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53 To estimate maximum possible score for the query against an index: - it creates a virtual index (LikelyReader), which - contains

Re: Maximum score estimation

2022-12-20 Thread Walter Underwood
Comparing scores within the result set for a single query works fine. Mapping those to [0,1] is fine, too. Comparing scores for different queries, or even for the same query at different times, isn’t valid. Showing the scores to people almost guarantees they’ll compare the scores between

Re: Maximum score estimation

2022-12-19 Thread J. Delgado
Actually, I believe that the Lucene scoring function is based on *Okapi BM25* (BM is an abbreviation of best matching) which is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson

Re: Maximum score estimation

2022-12-19 Thread Walter Underwood
That article is copied from the old wiki, so it is much earlier than 2019, more like 2009. Unfortunately, the links to the email discussion are all dead, but the issues in the article are still true. If you really want to go down that path, you might be able to do it with a similarity class

Re: Maximum score estimation

2022-12-18 Thread Mikhail Khludnev
Thanks for replym Walter. Recently Robert commented on PR with the link https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages it gives arguments against my proposal. Honestly, I'm still in doubt. On Tue, Dec 6, 2022 at 8:15 PM Walter Underwood wrote: > As you point out, this is

Re: Maximum score estimation

2022-12-06 Thread Walter Underwood
As you point out, this is a probabilistic relevance model. Lucene uses a vector space model. A probabilistic model gives an estimate of how relevant each document is to the query. Unfortunately, their overall relevance isn’t as good as a vector space model. You could calculate an ideal score,