subject:"Adding vs multiplicating scores when implementing \"recency\""

Re: Adding vs multiplicating scores when implementing "recency"

2021-09-17 Thread Michael Sokolov

ah, thanks for the explanation

On Fri, Sep 17, 2021 at 10:11 AM Adrien Grand  wrote:
>
> This is one requirement indeed. Since WAND reasons about partially
> evaluated documents, it also requires that matching one more clause makes
> the overall score higher, which is why we introduced the requirement that
> scores must be positive in 8.0. For multiplication, this would require
> scores that are greater than 1.
>
> If someone really wanted to multiply scores, the easiest way might be to
> create a query wrapper that takes the log of the scores of the wrapped
> query, and rely on log(a)+log(b) = log(a * b).
>
> Le ven. 17 sept. 2021 à 14:47, Michael Sokolov  a
> écrit :
>
> > Not advocating any particular approach here, just curious: could BMW
> > also function in the presence of a doc-score (like recency) that is
> > multiplied? My vague understanding is that as long as the scoring
> > formula is monotonic in all of its inputs, and we have block-encoded
> > the inputs, then we could compute a max score for a block?
> >
> > On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand  wrote:
> > >
> > > Hello,
> > >
> > > You are correct that the contribution would be additive in that case. We
> > > don't provide an easy way to make the contribution multiplicative.
> > >
> > > There is some debate about what is the best way to combine BM25 scores
> > with
> > > query-independent features, though in the discussions I've seen
> > > contributions were summed up and the debate was more about whether they
> > > should be normalized or not.
> > >
> > > How much recency impacts ranking indeed depends on the number of terms
> > and
> > > how frequent these terms are. One way that I'm interpreting the fact that
> > > not everyone recommends normalizing scores is that this way the query
> > score
> > > dominates when the query is looking for something very specific, because
> > it
> > > includes many terms or because it uses very specific terms - which may
> > be a
> > > feature. This approach also works well for Lucene since dynamic pruning
> > via
> > > Block-Max WAND keeps working when query-independent features are
> > > incorporated into the final score, which helps figure out the top hits
> > > without having to collect all matches.
> > >
> > > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
> > >  wrote:
> > >
> > > > On March I've asked a question here that go no answers at all. As it
> > > > still something that I'd very much like to know I'll ask again.
> > > >
> > > > To implement "recency" into a search you would add a boolean clause
> > with
> > > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> > > > meaning that this recency will impact different for searches with
> > > > different number of terms, right? With more terms the recency component
> > > > contribution to score will be more and more "diluted". However... I
> > only
> > > > see examples using this way of doing, and I would need to do something
> > > > weird to implement a multiplicative change of the score... Am I missing
> > > > something?
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > >
> > > >
> > >
> > > --
> > > Adrien
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Adding vs multiplicating scores when implementing "recency"

2021-09-17 Thread Adrien Grand

This is one requirement indeed. Since WAND reasons about partially
evaluated documents, it also requires that matching one more clause makes
the overall score higher, which is why we introduced the requirement that
scores must be positive in 8.0. For multiplication, this would require
scores that are greater than 1.

If someone really wanted to multiply scores, the easiest way might be to
create a query wrapper that takes the log of the scores of the wrapped
query, and rely on log(a)+log(b) = log(a * b).

Le ven. 17 sept. 2021 à 14:47, Michael Sokolov  a
écrit :

> Not advocating any particular approach here, just curious: could BMW
> also function in the presence of a doc-score (like recency) that is
> multiplied? My vague understanding is that as long as the scoring
> formula is monotonic in all of its inputs, and we have block-encoded
> the inputs, then we could compute a max score for a block?
>
> On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand  wrote:
> >
> > Hello,
> >
> > You are correct that the contribution would be additive in that case. We
> > don't provide an easy way to make the contribution multiplicative.
> >
> > There is some debate about what is the best way to combine BM25 scores
> with
> > query-independent features, though in the discussions I've seen
> > contributions were summed up and the debate was more about whether they
> > should be normalized or not.
> >
> > How much recency impacts ranking indeed depends on the number of terms
> and
> > how frequent these terms are. One way that I'm interpreting the fact that
> > not everyone recommends normalizing scores is that this way the query
> score
> > dominates when the query is looking for something very specific, because
> it
> > includes many terms or because it uses very specific terms - which may
> be a
> > feature. This approach also works well for Lucene since dynamic pruning
> via
> > Block-Max WAND keeps working when query-independent features are
> > incorporated into the final score, which helps figure out the top hits
> > without having to collect all matches.
> >
> > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
> >  wrote:
> >
> > > On March I've asked a question here that go no answers at all. As it
> > > still something that I'd very much like to know I'll ask again.
> > >
> > > To implement "recency" into a search you would add a boolean clause
> with
> > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> > > meaning that this recency will impact different for searches with
> > > different number of terms, right? With more terms the recency component
> > > contribution to score will be more and more "diluted". However... I
> only
> > > see examples using this way of doing, and I would need to do something
> > > weird to implement a multiplicative change of the score... Am I missing
> > > something?
> > >
> > > Thanks!
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
> > --
> > Adrien
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Adding vs multiplicating scores when implementing "recency"

2021-09-17 Thread Michael Sokolov

Not advocating any particular approach here, just curious: could BMW
also function in the presence of a doc-score (like recency) that is
multiplied? My vague understanding is that as long as the scoring
formula is monotonic in all of its inputs, and we have block-encoded
the inputs, then we could compute a max score for a block?

On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand  wrote:
>
> Hello,
>
> You are correct that the contribution would be additive in that case. We
> don't provide an easy way to make the contribution multiplicative.
>
> There is some debate about what is the best way to combine BM25 scores with
> query-independent features, though in the discussions I've seen
> contributions were summed up and the debate was more about whether they
> should be normalized or not.
>
> How much recency impacts ranking indeed depends on the number of terms and
> how frequent these terms are. One way that I'm interpreting the fact that
> not everyone recommends normalizing scores is that this way the query score
> dominates when the query is looking for something very specific, because it
> includes many terms or because it uses very specific terms - which may be a
> feature. This approach also works well for Lucene since dynamic pruning via
> Block-Max WAND keeps working when query-independent features are
> incorporated into the final score, which helps figure out the top hits
> without having to collect all matches.
>
> On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
>  wrote:
>
> > On March I've asked a question here that go no answers at all. As it
> > still something that I'd very much like to know I'll ask again.
> >
> > To implement "recency" into a search you would add a boolean clause with
> > a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> > meaning that this recency will impact different for searches with
> > different number of terms, right? With more terms the recency component
> > contribution to score will be more and more "diluted". However... I only
> > see examples using this way of doing, and I would need to do something
> > weird to implement a multiplicative change of the score... Am I missing
> > something?
> >
> > Thanks!
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Adding vs multiplicating scores when implementing "recency"

2021-09-16 Thread Adrien Grand

Hello,

You are correct that the contribution would be additive in that case. We
don't provide an easy way to make the contribution multiplicative.

There is some debate about what is the best way to combine BM25 scores with
query-independent features, though in the discussions I've seen
contributions were summed up and the debate was more about whether they
should be normalized or not.

How much recency impacts ranking indeed depends on the number of terms and
how frequent these terms are. One way that I'm interpreting the fact that
not everyone recommends normalizing scores is that this way the query score
dominates when the query is looking for something very specific, because it
includes many terms or because it uses very specific terms - which may be a
feature. This approach also works well for Lucene since dynamic pruning via
Block-Max WAND keeps working when query-independent features are
incorporated into the final score, which helps figure out the top hits
without having to collect all matches.

On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
 wrote:

> On March I've asked a question here that go no answers at all. As it
> still something that I'd very much like to know I'll ask again.
>
> To implement "recency" into a search you would add a boolean clause with
> a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> meaning that this recency will impact different for searches with
> different number of terms, right? With more terms the recency component
> contribution to score will be more and more "diluted". However... I only
> see examples using this way of doing, and I would need to do something
> weird to implement a multiplicative change of the score... Am I missing
> something?
>
> Thanks!
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien

Adding vs multiplicating scores when implementing "recency"

2021-09-16 Thread Nicolás Lichtmaier

On March I've asked a question here that go no answers at all. As it 
still something that I'd very much like to know I'll ask again.


To implement "recency" into a search you would add a boolean clause with 
a LongPoint.newDistanceFeatureQuery(), right? But that's additive, 
meaning that this recency will impact different for searches with 
different number of terms, right? With more terms the recency component 
contribution to score will be more and more "diluted". However... I only 
see examples using this way of doing, and I would need to do something 
weird to implement a multiplicative change of the score... Am I missing 
something?


Thanks!


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Adding vs multiplicating scores when implementing "recency"

Re: Adding vs multiplicating scores when implementing "recency"

Re: Adding vs multiplicating scores when implementing "recency"

Re: Adding vs multiplicating scores when implementing "recency"

Adding vs multiplicating scores when implementing "recency"

5 matches

Site Navigation

Mail list logo

Footer information