Re: TF in MoreLikeThis
Sorry for the delay, but better late than never :). I put up a PR here: https://github.com/apache/lucene/pull/940. --Petko On Fri, Apr 1, 2022 at 10:11 AM Petko Minkov wrote: > Yeah, I'll be happy to. I'll try to get a patch out soon. > > On Fri, Apr 1, 2022 at 9:31 AM Adrien Grand wrote: > >> From a quick look, your suggestion of passing the term frequency to >> TFIDFSimilarity#tf makes sense. >> >> Would you like to contribute this change? You can find contributing >> guidelines here: >> https://github.com/apache/lucene/blob/main/CONTRIBUTING.md. >> >> On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: >> > >> > Hi, >> > >> > I was looking at Lucene's code for MoreLikeThis, specifically this line: >> > >> https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 >> > >> > It looks like in ClassicSimilarity, TF is a square root, but in the >> code TF >> > is used without the ClassicSimilarity::tf() function called. Is that a >> bug >> > - it will make TF have a disproportionately higher weight compared to >> IDF? >> > >> > --Petko >> >> >> >> -- >> Adrien >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>
Re: TF in MoreLikeThis
Yeah, I'll be happy to. I'll try to get a patch out soon. On Fri, Apr 1, 2022 at 9:31 AM Adrien Grand wrote: > From a quick look, your suggestion of passing the term frequency to > TFIDFSimilarity#tf makes sense. > > Would you like to contribute this change? You can find contributing > guidelines here: > https://github.com/apache/lucene/blob/main/CONTRIBUTING.md. > > On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: > > > > Hi, > > > > I was looking at Lucene's code for MoreLikeThis, specifically this line: > > > https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 > > > > It looks like in ClassicSimilarity, TF is a square root, but in the code > TF > > is used without the ClassicSimilarity::tf() function called. Is that a > bug > > - it will make TF have a disproportionately higher weight compared to > IDF? > > > > --Petko > > > > -- > Adrien > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: TF in MoreLikeThis
>From a quick look, your suggestion of passing the term frequency to TFIDFSimilarity#tf makes sense. Would you like to contribute this change? You can find contributing guidelines here: https://github.com/apache/lucene/blob/main/CONTRIBUTING.md. On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: > > Hi, > > I was looking at Lucene's code for MoreLikeThis, specifically this line: > https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 > > It looks like in ClassicSimilarity, TF is a square root, but in the code TF > is used without the ClassicSimilarity::tf() function called. Is that a bug > - it will make TF have a disproportionately higher weight compared to IDF? > > --Petko -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
TF in MoreLikeThis
Hi, I was looking at Lucene's code for MoreLikeThis, specifically this line: https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 It looks like in ClassicSimilarity, TF is a square root, but in the code TF is used without the ClassicSimilarity::tf() function called. Is that a bug - it will make TF have a disproportionately higher weight compared to IDF? --Petko