Re: Term Frequency Calculation - Clarification

ariya bala Wed, 20 May 2015 04:46:39 -0700

Thanks Jack.
In my case there is only one document - Foo Foo is in bar
As per your comment, I should expect TF to be 2.
But I am getting one.
Is there any check where if one match is a subset of other, is calculated
once?
My class extends DefaultSimilarity.


Cheers
Ariya Bala S

On Wed, May 20, 2015 at 2:09 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Yes.
>
> tf is both 1 and 2 - tf is per document, which is 1 for the first document
> and 2 for the second document.
>
> See:
>
> http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
>
> -- Jack Krupansky
>
> On Wed, May 20, 2015 at 6:13 AM, ariya bala <ariya...@gmail.com> wrote:
>
> > Hi,
> > I have made custom class for scoring the similarity
> > (TermFrequencyBiasedSimilarity).
> > The score was deduced by considering just the TF part (acheived  by
> setting
> > IDF=1).
> >
> > Question is:
> > -----------------
> > *Document content:* Foo Foo is in bar
> > *Search query:* Foo bar
> > *slop:* 3
> >
> > With Slop 3, There are two matches to the query
> >  Foo is in bar
> >  Foo Foo is in bar
> >
> > *Should the Term Frequency be 1 or 2? Also point to the explanation of
> the
> > logic implemented in Lucene/Solr.*
> >
> > --
> > Cheers
> > *Ariya *
> >
>



-- 
*Ariya *

Re: Term Frequency Calculation - Clarification

Reply via email to