Good to know you solved it! Yes, Distributed IDF is definitely a problem in case you have skewed documents distributions.
Cheers -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Sun, 5 Dec 2021 at 17:19, Sjoerd Smeets <ssme...@gmail.com> wrote: > Found it! > > I had to enable the > ExactStatsCache > > Found a description over here. Thanks for pointing me in the right > direction. > > https://solr.pl/en/2019/05/20/distributed-idf/ > > > On Sun, Dec 5, 2021 at 11:09 AM Sjoerd Smeets <ssme...@gmail.com> wrote: > >> Hi Allessandro, >> >> Thanks for your reply! Yes, the document are in the same result list and >> I'm not doing any indexing at the moment and executed a commit just to be >> sure. Still the same result. It is an environment with 4 shards. Perhaps >> that plays a factor? >> >> Thanks, >> Sjoerd >> >> On Sun, Dec 5, 2021 at 11:02 AM Alessandro Benedetti < >> a.benede...@sease.io> wrote: >> >>> It's seems like the underline index changed. >>> Are those two documents in the same result set? >>> Is it just one query? >>> It's definitely curious, even if a commit happened search results are >>> consistent in one searcher. >>> >>> >>> On Sun, 5 Dec 2021, 16:28 Sjoerd Smeets, <ssme...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I'm debugging the relevancy scores of my query and I see the following >>>> for >>>> two documents hits. My question is, why is the idf score not the same >>>> for >>>> both documents? This is Solr 6.6. >>>> >>>> Any guidance would be much appreciated. >>>> >>>> Thanks! >>>> >>>> *Doc1* >>>> "71d72354eea23b9eae934ab616e8ce38de69d760": " >>>> 104.994415 = sum of: >>>> 104.994415 = sum of: >>>> 82.89969 = weight(stemmed_data.timenote.narratives:remedi in 22470) >>>> [SchemaSimilarity], result of: >>>> 82.89969 = score(freq=9.0), computed as boost * idf * tf from: >>>> 100.0 = boost >>>> 0.87546873 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) >>>> from: >>>> *52 = n, number of documents containing term* >>>> *125 = N, total number of documents with field* >>>> 0.9469177 = tf, computed as freq / (freq + k1 * (1 - b + b * dl >>>> / >>>> avgdl)) from: >>>> 9.0 = freq, occurrences of term within document >>>> 1.2 = k1, term saturation parameter >>>> 0.75 = b, length normalization parameter >>>> 12312.0 = dl, length of field (approximate) >>>> 54179.03 = avgdl, average length of field >>>> 22.09473 = weight(stemmed_data.timenote.matters:remedi in 22470) >>>> [SchemaSimilarity], result of: >>>> 22.09473 = score(freq=4.0), computed as boost * idf * tf from: >>>> 10.0 = boost >>>> 2.4308395 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) >>>> from: >>>> *9 = n, number of documents containing term* >>>> *107 = N, total number of documents with field* >>>> 0.9089341 = tf, computed as freq / (freq + k1 * (1 - b + b * dl >>>> / >>>> avgdl)) from: >>>> 4.0 = freq, occurrences of term within document >>>> 1.2 = k1, term saturation parameter >>>> 0.75 = b, length normalization parameter >>>> 5656.0 = dl, length of field (approximate) >>>> 50520.543 = avgdl, average length of field >>>> 0.0 = FunctionQuery(int(s_integer_search.previews)), product of: >>>> 0.0 = int(s_integer_search.previews)=0 >>>> 1.0 = boost >>>> 0.0 = FunctionQuery(int(s_integer_search.downloads)), product of: >>>> 0.0 = int(s_integer_search.downloads)=0 >>>> 1.0 = boost >>>> " >>>> >>>> *Doc2* >>>> "80302a1ecc44d1e556970ab96c25b1fd3328a854": " >>>> 84.61461 = sum of: >>>> 84.61461 = sum of: >>>> 64.68881 = weight(stemmed_data.timenote.narratives:remedi in 0) >>>> [SchemaSimilarity], result of: >>>> 64.68881 = score(freq=493.0), computed as boost * idf * tf from: >>>> 100.0 = boost >>>> 0.65094686 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) >>>> from: >>>> *60 = n, number of documents containing term* >>>> *115 = N, total number of documents with field* >>>> 0.99376476 = tf, computed as freq / (freq + k1 * (1 - b + b * >>>> dl / >>>> avgdl)) from: >>>> 493.0 = freq, occurrences of term within document >>>> 1.2 = k1, term saturation parameter >>>> 0.75 = b, length normalization parameter >>>> 229400.0 = dl, length of field (approximate) >>>> 73913.91 = avgdl, average length of field >>>> 19.9258 = weight(stemmed_data.timenote.matters:remedi in 0) >>>> [SchemaSimilarity], result of: >>>> 19.9258 = score(freq=340.0), computed as boost * idf * tf from: >>>> 10.0 = boost >>>> 2.0024805 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) >>>> from: >>>> *13 = n, number of documents containing term* >>>> *99 = N, total number of documents with field* >>>> 0.99505585 = tf, computed as freq / (freq + k1 * (1 - b + b * >>>> dl / >>>> avgdl)) from: >>>> 340.0 = freq, occurrences of term within document >>>> 1.2 = k1, term saturation parameter >>>> 0.75 = b, length normalization parameter >>>> 147480.0 = dl, length of field (approximate) >>>> 95534.95 = avgdl, average length of field >>>> 0.0 = FunctionQuery(int(s_integer_search.previews)), product of: >>>> 0.0 = int(s_integer_search.previews)=0 >>>> 1.0 = boost >>>> 0.0 = FunctionQuery(int(s_integer_search.downloads)), product of: >>>> 0.0 = int(s_integer_search.downloads)=0 >>>> 1.0 = boost >>>> " >>>> >>>