Hi Alessandro, Thank you for spending some time to look into my query. I am still trying to understand the use of the function under computeRelatedness using the number 30 and also some other numbers. The use of the foreground count will help as an additional parameter if it were present. It will take me some time to work on your idea. Hence for now will continue with what I have. Thanks again for your inputs.
On Mon, Jul 26, 2021 at 8:18 PM Alessandro Benedetti <[email protected]> wrote: > Hi Kerwin, > I was taking a look to your question and the > *org.apache.solr.search.facet.RelatednessAgg* code, in line : > -------------------------- > Alessandro Benedetti > Apache Lucene/Solr Committer > Director, R&D Software Engineer, Search Consultant > > www.sease.io > > > On Thu, 22 Jul 2021 at 08:27, Kerwin <[email protected]> wrote: > > > Hi Solr users, > > > > I have a question on the relatedness and Semantic Knowledge Graphs > feature > > in Solr. > > While the results are good with the out of box provision, I need some > > tweaking on the ability to specify filters or parameters based on only > the > > foreground count. Right now only the min_popularity parameter is > available > > which applies to both the foreground dataset or the background one. > > so far so good > > > The > > white paper from Trey Grainger and his team mention that the z score is > > used to calculate the score. As per my understanding, the z score > assumes a > > normal distribution and is applicable when sample size>30 which I assume > is > > the foreground count. > > I don't have time right now to go through the paper, but the only place I > found the '30' magic number in the class is within this > method: org.apache.solr.search.facet.RelatednessAgg#computeRelatedness > It's not even defined as a constant nor a variable driven by a param so > it's not possible to change it unless we improve the code. > > > So I would like to control this value with a > > parameter or filter. Right now I am getting the approximate count by > doing > > a reverse calculation on the foreground popularity and the background > size > > to get the foreground count. Kindly correct me if my understanding is > > different from what it should be. > > > What I recommend is to take a look at the code references I put, and write > a contribution on your own to add the additional configuration with the > explanation. > As a committer, I would be happy to review such work and merge it in if it > improves the relatedness aggregation (we could take the occasion to also > rename some of the variables, which seem to not align with java standard > 'min_pop' => minPopularity, ect ect > Cheers >
