Thanks Jack. In my case there is only one document - Foo Foo is in bar As per your comment, I should expect TF to be 2. But I am getting one. Is there any check where if one match is a subset of other, is calculated once? My class extends DefaultSimilarity.
Cheers Ariya Bala S On Wed, May 20, 2015 at 2:09 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Yes. > > tf is both 1 and 2 - tf is per document, which is 1 for the first document > and 2 for the second document. > > See: > > http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html > > > -- Jack Krupansky > > On Wed, May 20, 2015 at 6:13 AM, ariya bala <ariya...@gmail.com> wrote: > > > Hi, > > I have made custom class for scoring the similarity > > (TermFrequencyBiasedSimilarity). > > The score was deduced by considering just the TF part (acheived by > setting > > IDF=1). > > > > Question is: > > ----------------- > > *Document content:* Foo Foo is in bar > > *Search query:* Foo bar > > *slop:* 3 > > > > With Slop 3, There are two matches to the query > > Foo is in bar > > Foo Foo is in bar > > > > *Should the Term Frequency be 1 or 2? Also point to the explanation of > the > > logic implemented in Lucene/Solr.* > > > > -- > > Cheers > > *Ariya * > > > -- *Ariya *