Re: Adding vs multiplicating scores when implementing "recency"
ah, thanks for the explanation On Fri, Sep 17, 2021 at 10:11 AM Adrien Grand wrote: > > This is one requirement indeed. Since WAND reasons about partially > evaluated documents, it also requires that matching one more clause makes > the overall score higher, which is why we introduced the requirement that > scores must be positive in 8.0. For multiplication, this would require > scores that are greater than 1. > > If someone really wanted to multiply scores, the easiest way might be to > create a query wrapper that takes the log of the scores of the wrapped > query, and rely on log(a)+log(b) = log(a * b). > > Le ven. 17 sept. 2021 à 14:47, Michael Sokolov a > écrit : > > > Not advocating any particular approach here, just curious: could BMW > > also function in the presence of a doc-score (like recency) that is > > multiplied? My vague understanding is that as long as the scoring > > formula is monotonic in all of its inputs, and we have block-encoded > > the inputs, then we could compute a max score for a block? > > > > On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand wrote: > > > > > > Hello, > > > > > > You are correct that the contribution would be additive in that case. We > > > don't provide an easy way to make the contribution multiplicative. > > > > > > There is some debate about what is the best way to combine BM25 scores > > with > > > query-independent features, though in the discussions I've seen > > > contributions were summed up and the debate was more about whether they > > > should be normalized or not. > > > > > > How much recency impacts ranking indeed depends on the number of terms > > and > > > how frequent these terms are. One way that I'm interpreting the fact that > > > not everyone recommends normalizing scores is that this way the query > > score > > > dominates when the query is looking for something very specific, because > > it > > > includes many terms or because it uses very specific terms - which may > > be a > > > feature. This approach also works well for Lucene since dynamic pruning > > via > > > Block-Max WAND keeps working when query-independent features are > > > incorporated into the final score, which helps figure out the top hits > > > without having to collect all matches. > > > > > > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier > > > wrote: > > > > > > > On March I've asked a question here that go no answers at all. As it > > > > still something that I'd very much like to know I'll ask again. > > > > > > > > To implement "recency" into a search you would add a boolean clause > > with > > > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive, > > > > meaning that this recency will impact different for searches with > > > > different number of terms, right? With more terms the recency component > > > > contribution to score will be more and more "diluted". However... I > > only > > > > see examples using this way of doing, and I would need to do something > > > > weird to implement a multiplicative change of the score... Am I missing > > > > something? > > > > > > > > Thanks! > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > -- > > > Adrien > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Adding vs multiplicating scores when implementing "recency"
This is one requirement indeed. Since WAND reasons about partially evaluated documents, it also requires that matching one more clause makes the overall score higher, which is why we introduced the requirement that scores must be positive in 8.0. For multiplication, this would require scores that are greater than 1. If someone really wanted to multiply scores, the easiest way might be to create a query wrapper that takes the log of the scores of the wrapped query, and rely on log(a)+log(b) = log(a * b). Le ven. 17 sept. 2021 à 14:47, Michael Sokolov a écrit : > Not advocating any particular approach here, just curious: could BMW > also function in the presence of a doc-score (like recency) that is > multiplied? My vague understanding is that as long as the scoring > formula is monotonic in all of its inputs, and we have block-encoded > the inputs, then we could compute a max score for a block? > > On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand wrote: > > > > Hello, > > > > You are correct that the contribution would be additive in that case. We > > don't provide an easy way to make the contribution multiplicative. > > > > There is some debate about what is the best way to combine BM25 scores > with > > query-independent features, though in the discussions I've seen > > contributions were summed up and the debate was more about whether they > > should be normalized or not. > > > > How much recency impacts ranking indeed depends on the number of terms > and > > how frequent these terms are. One way that I'm interpreting the fact that > > not everyone recommends normalizing scores is that this way the query > score > > dominates when the query is looking for something very specific, because > it > > includes many terms or because it uses very specific terms - which may > be a > > feature. This approach also works well for Lucene since dynamic pruning > via > > Block-Max WAND keeps working when query-independent features are > > incorporated into the final score, which helps figure out the top hits > > without having to collect all matches. > > > > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier > > wrote: > > > > > On March I've asked a question here that go no answers at all. As it > > > still something that I'd very much like to know I'll ask again. > > > > > > To implement "recency" into a search you would add a boolean clause > with > > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive, > > > meaning that this recency will impact different for searches with > > > different number of terms, right? With more terms the recency component > > > contribution to score will be more and more "diluted". However... I > only > > > see examples using this way of doing, and I would need to do something > > > weird to implement a multiplicative change of the score... Am I missing > > > something? > > > > > > Thanks! > > > > > > > > > - > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > -- > > Adrien > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Adding vs multiplicating scores when implementing "recency"
Not advocating any particular approach here, just curious: could BMW also function in the presence of a doc-score (like recency) that is multiplied? My vague understanding is that as long as the scoring formula is monotonic in all of its inputs, and we have block-encoded the inputs, then we could compute a max score for a block? On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand wrote: > > Hello, > > You are correct that the contribution would be additive in that case. We > don't provide an easy way to make the contribution multiplicative. > > There is some debate about what is the best way to combine BM25 scores with > query-independent features, though in the discussions I've seen > contributions were summed up and the debate was more about whether they > should be normalized or not. > > How much recency impacts ranking indeed depends on the number of terms and > how frequent these terms are. One way that I'm interpreting the fact that > not everyone recommends normalizing scores is that this way the query score > dominates when the query is looking for something very specific, because it > includes many terms or because it uses very specific terms - which may be a > feature. This approach also works well for Lucene since dynamic pruning via > Block-Max WAND keeps working when query-independent features are > incorporated into the final score, which helps figure out the top hits > without having to collect all matches. > > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier > wrote: > > > On March I've asked a question here that go no answers at all. As it > > still something that I'd very much like to know I'll ask again. > > > > To implement "recency" into a search you would add a boolean clause with > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive, > > meaning that this recency will impact different for searches with > > different number of terms, right? With more terms the recency component > > contribution to score will be more and more "diluted". However... I only > > see examples using this way of doing, and I would need to do something > > weird to implement a multiplicative change of the score... Am I missing > > something? > > > > Thanks! > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > -- > Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Adding vs multiplicating scores when implementing "recency"
Hello, You are correct that the contribution would be additive in that case. We don't provide an easy way to make the contribution multiplicative. There is some debate about what is the best way to combine BM25 scores with query-independent features, though in the discussions I've seen contributions were summed up and the debate was more about whether they should be normalized or not. How much recency impacts ranking indeed depends on the number of terms and how frequent these terms are. One way that I'm interpreting the fact that not everyone recommends normalizing scores is that this way the query score dominates when the query is looking for something very specific, because it includes many terms or because it uses very specific terms - which may be a feature. This approach also works well for Lucene since dynamic pruning via Block-Max WAND keeps working when query-independent features are incorporated into the final score, which helps figure out the top hits without having to collect all matches. On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier wrote: > On March I've asked a question here that go no answers at all. As it > still something that I'd very much like to know I'll ask again. > > To implement "recency" into a search you would add a boolean clause with > a LongPoint.newDistanceFeatureQuery(), right? But that's additive, > meaning that this recency will impact different for searches with > different number of terms, right? With more terms the recency component > contribution to score will be more and more "diluted". However... I only > see examples using this way of doing, and I would need to do something > weird to implement a multiplicative change of the score... Am I missing > something? > > Thanks! > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Adrien
Adding vs multiplicating scores when implementing "recency"
On March I've asked a question here that go no answers at all. As it still something that I'd very much like to know I'll ask again. To implement "recency" into a search you would add a boolean clause with a LongPoint.newDistanceFeatureQuery(), right? But that's additive, meaning that this recency will impact different for searches with different number of terms, right? With more terms the recency component contribution to score will be more and more "diluted". However... I only see examples using this way of doing, and I would need to do something weird to implement a multiplicative change of the score... Am I missing something? Thanks! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org