Re: Stats in CustomScoreProvider + (in)correctness of LMDirichletSimilarity

2015-05-02 Thread Stephen Wu
Sorry, I was wrong on my solution for #2 -- linking some equations here that should explain a consistent approach. Leaving LMDirichletSimilarity as-is skews the "additive queryNorm" factor. LMDirichletSimilarity should have only

Stats in CustomScoreProvider + (in)correctness of LMDirichletSimilarity

2015-05-02 Thread Stephen Wu
I am having trouble getting collection probabilities for a term to show up in a CustomScoreQuery/CustomScoreProvider. Basically, I am trying to add a per-document weight that amounts to the sum (for each term in the query) of Math.log(collectionProbability). Can anyone help with this? Or feel fr