Re: DisjunctionMinQuery

2023-11-09 Thread Marc D'Mello
Hi all,

Once again, thanks for the responses! After thinking about this a bit more,
I think Michael's response makes sense now. I do agree that partial matches
shouldn't be ranked higher than conjunctive matches, so I think it doesn't
make sense in my use case to use a DisjunctiveMinQuery (I think I would
need a AndMinQuery or something like that). This also answers my initial
question.

I did have a question about this though:

in that case you should use something like 1/x as your scoring function
> in the sub-clauses
>

Doesn't using 1/x as a scoring function, even in the subclauses, still
cause an issue where the output score will be inversely correlated to the
indexed term score? I think that would break BMW right? Or maybe I am
misunderstanding the suggestion.

Thanks,
Marc

On Thu, Nov 9, 2023 at 10:18 AM Uwe Schindler  wrote:

> Hi,
>
> in that case you should use something like 1/x as your scoring function
> in the sub-clauses. In Lucene scores should go up for more relevancy.
> This must also apply for function scoring.
>
> Uwe
>
> Am 09.11.2023 um 19:14 schrieb Marc D'Mello:
> > Hi Michael,
> >
> > Thanks for the response! So to answer your first question, yes this would
> > keep the lowest score from the matching sub-scorers. Our use case is that
> > we have a custom term-level score overriding term frequency and we want
> to
> > take the min of that as part of our scoring function. Maybe it's a niche
> > use case?
> >
> > Thanks,
> > Marc
> >
> > On Wed, Nov 8, 2023 at 3:19 PM Michael Froh  wrote:
> >
> >> Hi Marc,
> >>
> >> Can you clarify what the semantics of a DisjunctionMinQuery would be?
> Would
> >> you keep the score for the *lowest* scoring disjunct (plus some
> tiebreaker
> >> applied to the other matching disjuncts)?
> >>
> >> I'm trying to imagine how that would work compared to the classic DisMax
> >> use-case. Say I'm searching for "dalmatian" using a DisMax query over
> term
> >> queries against title and body. A match on title is probably going to
> score
> >> higher than a match against the body, just because the title has a
> shorter
> >> length (and the doc frequency of individual terms in the title is
> likely to
> >> be lower, since there are fewer terms overall). With DisMax, a match on
> >> title alone will score higher than a match on body, and the tie-break
> will
> >> tend to score a match on title and body higher than a match on title
> alone.
> >>
> >> With a DisMin (assuming you keep the lowest score), then a match on
> title
> >> and body would probably score lower than a match on title alone. That
> feels
> >> weird to me, but I might be missing the use-case.
> >>
> >> How would you use a DisMinQuery?
> >>
> >> Thanks,
> >> Froh
> >>
> >>
> >>
> >> On Wed, Nov 8, 2023 at 10:50 AM Marc D'Mello 
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I noticed we have a DisjunctionMaxQuery
> >>> <
> >>>
> >>
> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
> >>> but
> >>> not a corresponding DisjunctionMinQuery. I was just wondering if there
> >> was
> >>> a specific reason for that? Or is it just that it is not a common query
> >> to
> >>> use?
> >>>
> >>> Thanks!
> >>> Marc
> >>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: DisjunctionMinQuery

2023-11-09 Thread Uwe Schindler

Hi,

in that case you should use something like 1/x as your scoring function 
in the sub-clauses. In Lucene scores should go up for more relevancy. 
This must also apply for function scoring.


Uwe

Am 09.11.2023 um 19:14 schrieb Marc D'Mello:

Hi Michael,

Thanks for the response! So to answer your first question, yes this would
keep the lowest score from the matching sub-scorers. Our use case is that
we have a custom term-level score overriding term frequency and we want to
take the min of that as part of our scoring function. Maybe it's a niche
use case?

Thanks,
Marc

On Wed, Nov 8, 2023 at 3:19 PM Michael Froh  wrote:


Hi Marc,

Can you clarify what the semantics of a DisjunctionMinQuery would be? Would
you keep the score for the *lowest* scoring disjunct (plus some tiebreaker
applied to the other matching disjuncts)?

I'm trying to imagine how that would work compared to the classic DisMax
use-case. Say I'm searching for "dalmatian" using a DisMax query over term
queries against title and body. A match on title is probably going to score
higher than a match against the body, just because the title has a shorter
length (and the doc frequency of individual terms in the title is likely to
be lower, since there are fewer terms overall). With DisMax, a match on
title alone will score higher than a match on body, and the tie-break will
tend to score a match on title and body higher than a match on title alone.

With a DisMin (assuming you keep the lowest score), then a match on title
and body would probably score lower than a match on title alone. That feels
weird to me, but I might be missing the use-case.

How would you use a DisMinQuery?

Thanks,
Froh



On Wed, Nov 8, 2023 at 10:50 AM Marc D'Mello  wrote:


Hi all,

I noticed we have a DisjunctionMaxQuery
<


https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java

but
not a corresponding DisjunctionMinQuery. I was just wondering if there

was

a specific reason for that? Or is it just that it is not a common query

to

use?

Thanks!
Marc


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMinQuery

2023-11-09 Thread Marc D'Mello
Hi Michael,

Thanks for the response! So to answer your first question, yes this would
keep the lowest score from the matching sub-scorers. Our use case is that
we have a custom term-level score overriding term frequency and we want to
take the min of that as part of our scoring function. Maybe it's a niche
use case?

Thanks,
Marc

On Wed, Nov 8, 2023 at 3:19 PM Michael Froh  wrote:

> Hi Marc,
>
> Can you clarify what the semantics of a DisjunctionMinQuery would be? Would
> you keep the score for the *lowest* scoring disjunct (plus some tiebreaker
> applied to the other matching disjuncts)?
>
> I'm trying to imagine how that would work compared to the classic DisMax
> use-case. Say I'm searching for "dalmatian" using a DisMax query over term
> queries against title and body. A match on title is probably going to score
> higher than a match against the body, just because the title has a shorter
> length (and the doc frequency of individual terms in the title is likely to
> be lower, since there are fewer terms overall). With DisMax, a match on
> title alone will score higher than a match on body, and the tie-break will
> tend to score a match on title and body higher than a match on title alone.
>
> With a DisMin (assuming you keep the lowest score), then a match on title
> and body would probably score lower than a match on title alone. That feels
> weird to me, but I might be missing the use-case.
>
> How would you use a DisMinQuery?
>
> Thanks,
> Froh
>
>
>
> On Wed, Nov 8, 2023 at 10:50 AM Marc D'Mello  wrote:
>
> > Hi all,
> >
> > I noticed we have a DisjunctionMaxQuery
> > <
> >
> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
> > >
> > but
> > not a corresponding DisjunctionMinQuery. I was just wondering if there
> was
> > a specific reason for that? Or is it just that it is not a common query
> to
> > use?
> >
> > Thanks!
> > Marc
> >
>