Re: Relevancy debugging - idf score

Alessandro Benedetti Sun, 05 Dec 2021 09:03:00 -0800

It's seems like the underline index changed.
Are those two documents in the same result set?
Is it just one query?
It's definitely curious, even if a commit happened search results are
consistent in one searcher.



On Sun, 5 Dec 2021, 16:28 Sjoerd Smeets, <ssme...@gmail.com> wrote:

> Hi all,
>
> I'm debugging the relevancy scores of my query and I see the following for
> two documents hits. My question is, why is the idf score not the same for
> both documents? This is Solr 6.6.
>
> Any guidance would be much appreciated.
>
> Thanks!
>
> *Doc1*
> "71d72354eea23b9eae934ab616e8ce38de69d760": "
> 104.994415 = sum of:
>   104.994415 = sum of:
>     82.89969 = weight(stemmed_data.timenote.narratives:remedi in 22470)
> [SchemaSimilarity], result of:
>       82.89969 = score(freq=9.0), computed as boost * idf * tf from:
>         100.0 = boost
>         0.87546873 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5))
> from:
>           *52 = n, number of documents containing term*
>           *125 = N, total number of documents with field*
>         0.9469177 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
> avgdl)) from:
>           9.0 = freq, occurrences of term within document
>           1.2 = k1, term saturation parameter
>           0.75 = b, length normalization parameter
>           12312.0 = dl, length of field (approximate)
>           54179.03 = avgdl, average length of field
>     22.09473 = weight(stemmed_data.timenote.matters:remedi in 22470)
> [SchemaSimilarity], result of:
>       22.09473 = score(freq=4.0), computed as boost * idf * tf from:
>         10.0 = boost
>         2.4308395 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5))
> from:
>           *9 = n, number of documents containing term*
>           *107 = N, total number of documents with field*
>         0.9089341 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
> avgdl)) from:
>           4.0 = freq, occurrences of term within document
>           1.2 = k1, term saturation parameter
>           0.75 = b, length normalization parameter
>           5656.0 = dl, length of field (approximate)
>           50520.543 = avgdl, average length of field
>   0.0 = FunctionQuery(int(s_integer_search.previews)), product of:
>     0.0 = int(s_integer_search.previews)=0
>     1.0 = boost
>   0.0 = FunctionQuery(int(s_integer_search.downloads)), product of:
>     0.0 = int(s_integer_search.downloads)=0
>     1.0 = boost
> "
>
> *Doc2*
> "80302a1ecc44d1e556970ab96c25b1fd3328a854": "
> 84.61461 = sum of:
>   84.61461 = sum of:
>     64.68881 = weight(stemmed_data.timenote.narratives:remedi in 0)
> [SchemaSimilarity], result of:
>       64.68881 = score(freq=493.0), computed as boost * idf * tf from:
>         100.0 = boost
>         0.65094686 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5))
> from:
>           *60 = n, number of documents containing term*
>           *115 = N, total number of documents with field*
>         0.99376476 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
> avgdl)) from:
>           493.0 = freq, occurrences of term within document
>           1.2 = k1, term saturation parameter
>           0.75 = b, length normalization parameter
>           229400.0 = dl, length of field (approximate)
>           73913.91 = avgdl, average length of field
>     19.9258 = weight(stemmed_data.timenote.matters:remedi in 0)
> [SchemaSimilarity], result of:
>       19.9258 = score(freq=340.0), computed as boost * idf * tf from:
>         10.0 = boost
>         2.0024805 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5))
> from:
>           *13 = n, number of documents containing term*
>           *99 = N, total number of documents with field*
>         0.99505585 = tf, computed as freq / (freq + k1 * (1 - b + b * dl /
> avgdl)) from:
>           340.0 = freq, occurrences of term within document
>           1.2 = k1, term saturation parameter
>           0.75 = b, length normalization parameter
>           147480.0 = dl, length of field (approximate)
>           95534.95 = avgdl, average length of field
>   0.0 = FunctionQuery(int(s_integer_search.previews)), product of:
>     0.0 = int(s_integer_search.previews)=0
>     1.0 = boost
>   0.0 = FunctionQuery(int(s_integer_search.downloads)), product of:
>     0.0 = int(s_integer_search.downloads)=0
>     1.0 = boost
> "
>

Re: Relevancy debugging - idf score

Reply via email to