someone can check my facts here, but the log-likelihood ratio follows a chi-square distribution. You can figure an actual probability from that in the usual way, from its CDF. You would need to tweak the code you see in the project to compute an actual LLR by normalizing the input.
You could use 1-p then as a similarity metric. This also isn't how the test statistic is turned into a similarity metric in the project now. But 1-p sounds nicer. Maybe the historical reason was speed, or, ignorance. On Thu, Jun 20, 2013 at 8:53 AM, Dan Filimon <dangeorge.fili...@gmail.com> wrote: > When computing item-item similarity using the log-likelihood similarity > [1], can I simply apply a sigmoid do the resulting values to get the > probability that two items are similar? > > Is there any other processing I need to do? > > Thanks! > > [1] http://tdunning.blogspot.ro/2008/03/surprise-and-coincidence.html