Re: DisjunctionMinQuery
Hi all, Once again, thanks for the responses! After thinking about this a bit more, I think Michael's response makes sense now. I do agree that partial matches shouldn't be ranked higher than conjunctive matches, so I think it doesn't make sense in my use case to use a DisjunctiveMinQuery (I think I would need a AndMinQuery or something like that). This also answers my initial question. I did have a question about this though: in that case you should use something like 1/x as your scoring function > in the sub-clauses > Doesn't using 1/x as a scoring function, even in the subclauses, still cause an issue where the output score will be inversely correlated to the indexed term score? I think that would break BMW right? Or maybe I am misunderstanding the suggestion. Thanks, Marc On Thu, Nov 9, 2023 at 10:18 AM Uwe Schindler wrote: > Hi, > > in that case you should use something like 1/x as your scoring function > in the sub-clauses. In Lucene scores should go up for more relevancy. > This must also apply for function scoring. > > Uwe > > Am 09.11.2023 um 19:14 schrieb Marc D'Mello: > > Hi Michael, > > > > Thanks for the response! So to answer your first question, yes this would > > keep the lowest score from the matching sub-scorers. Our use case is that > > we have a custom term-level score overriding term frequency and we want > to > > take the min of that as part of our scoring function. Maybe it's a niche > > use case? > > > > Thanks, > > Marc > > > > On Wed, Nov 8, 2023 at 3:19 PM Michael Froh wrote: > > > >> Hi Marc, > >> > >> Can you clarify what the semantics of a DisjunctionMinQuery would be? > Would > >> you keep the score for the *lowest* scoring disjunct (plus some > tiebreaker > >> applied to the other matching disjuncts)? > >> > >> I'm trying to imagine how that would work compared to the classic DisMax > >> use-case. Say I'm searching for "dalmatian" using a DisMax query over > term > >> queries against title and body. A match on title is probably going to > score > >> higher than a match against the body, just because the title has a > shorter > >> length (and the doc frequency of individual terms in the title is > likely to > >> be lower, since there are fewer terms overall). With DisMax, a match on > >> title alone will score higher than a match on body, and the tie-break > will > >> tend to score a match on title and body higher than a match on title > alone. > >> > >> With a DisMin (assuming you keep the lowest score), then a match on > title > >> and body would probably score lower than a match on title alone. That > feels > >> weird to me, but I might be missing the use-case. > >> > >> How would you use a DisMinQuery? > >> > >> Thanks, > >> Froh > >> > >> > >> > >> On Wed, Nov 8, 2023 at 10:50 AM Marc D'Mello > wrote: > >> > >>> Hi all, > >>> > >>> I noticed we have a DisjunctionMaxQuery > >>> < > >>> > >> > https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java > >>> but > >>> not a corresponding DisjunctionMinQuery. I was just wondering if there > >> was > >>> a specific reason for that? Or is it just that it is not a common query > >> to > >>> use? > >>> > >>> Thanks! > >>> Marc > >>> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: DisjunctionMinQuery
Hi, in that case you should use something like 1/x as your scoring function in the sub-clauses. In Lucene scores should go up for more relevancy. This must also apply for function scoring. Uwe Am 09.11.2023 um 19:14 schrieb Marc D'Mello: Hi Michael, Thanks for the response! So to answer your first question, yes this would keep the lowest score from the matching sub-scorers. Our use case is that we have a custom term-level score overriding term frequency and we want to take the min of that as part of our scoring function. Maybe it's a niche use case? Thanks, Marc On Wed, Nov 8, 2023 at 3:19 PM Michael Froh wrote: Hi Marc, Can you clarify what the semantics of a DisjunctionMinQuery would be? Would you keep the score for the *lowest* scoring disjunct (plus some tiebreaker applied to the other matching disjuncts)? I'm trying to imagine how that would work compared to the classic DisMax use-case. Say I'm searching for "dalmatian" using a DisMax query over term queries against title and body. A match on title is probably going to score higher than a match against the body, just because the title has a shorter length (and the doc frequency of individual terms in the title is likely to be lower, since there are fewer terms overall). With DisMax, a match on title alone will score higher than a match on body, and the tie-break will tend to score a match on title and body higher than a match on title alone. With a DisMin (assuming you keep the lowest score), then a match on title and body would probably score lower than a match on title alone. That feels weird to me, but I might be missing the use-case. How would you use a DisMinQuery? Thanks, Froh On Wed, Nov 8, 2023 at 10:50 AM Marc D'Mello wrote: Hi all, I noticed we have a DisjunctionMaxQuery < https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java but not a corresponding DisjunctionMinQuery. I was just wondering if there was a specific reason for that? Or is it just that it is not a common query to use? Thanks! Marc -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMinQuery
Hi Michael, Thanks for the response! So to answer your first question, yes this would keep the lowest score from the matching sub-scorers. Our use case is that we have a custom term-level score overriding term frequency and we want to take the min of that as part of our scoring function. Maybe it's a niche use case? Thanks, Marc On Wed, Nov 8, 2023 at 3:19 PM Michael Froh wrote: > Hi Marc, > > Can you clarify what the semantics of a DisjunctionMinQuery would be? Would > you keep the score for the *lowest* scoring disjunct (plus some tiebreaker > applied to the other matching disjuncts)? > > I'm trying to imagine how that would work compared to the classic DisMax > use-case. Say I'm searching for "dalmatian" using a DisMax query over term > queries against title and body. A match on title is probably going to score > higher than a match against the body, just because the title has a shorter > length (and the doc frequency of individual terms in the title is likely to > be lower, since there are fewer terms overall). With DisMax, a match on > title alone will score higher than a match on body, and the tie-break will > tend to score a match on title and body higher than a match on title alone. > > With a DisMin (assuming you keep the lowest score), then a match on title > and body would probably score lower than a match on title alone. That feels > weird to me, but I might be missing the use-case. > > How would you use a DisMinQuery? > > Thanks, > Froh > > > > On Wed, Nov 8, 2023 at 10:50 AM Marc D'Mello wrote: > > > Hi all, > > > > I noticed we have a DisjunctionMaxQuery > > < > > > https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java > > > > > but > > not a corresponding DisjunctionMinQuery. I was just wondering if there > was > > a specific reason for that? Or is it just that it is not a common query > to > > use? > > > > Thanks! > > Marc > > >