Given that ratings have only a small effect on recommendations (i.e. you can reduce to the binary case with much loss), and given that rank based statistics don't like tied scores, I wouldn't expect much gain from Hoeffding's D.
On Sun, Apr 24, 2011 at 1:20 AM, Sean Owen <[email protected]> wrote: > Yes, this Hoeffding *bound*, and one flavor of the Chernoff bound, are not > really similarity metrics. They can construct loose bounds on what the true > mean of a random variable must be given some observations. > > (These bounds have a use, but in another part of the recommender > computation. And in those parts, you know the distribution of the random > variable so can use a better bound.) > > The Hoeffding D statistic (which I had to re-lookup for sure) is what you > want. It is spiritually like the Spearman correlation in that it is based > on > relative ordering of ratings rather than rating values. I have never played > with it but it is an interesting possibility. I imagine it will be slow to > compute, but, could be a useful addition. You would probably want to > refactor SpearmanCorrelationSimilarity a little since it already does the > conversion to rank and then subclass to implement this formula if > interested. > > Negative correlation just means two series are in fact related, but one > goes > up when the other goes down and vice versa, rather than trending together, > which would be positive correlation. > > On Sun, Apr 24, 2011 at 4:58 AM, Lance Norskog <[email protected]> wrote: > > > This paper compares Pearson, Spearman and Hoeffding's D measure as > > similarity measures for DNA matching. It claims Hoeffding is the best. > > http://www.ncbi.nlm.nih.gov/pubmed/19634197 > > > > Chasing down Hoeffding as a similarity measure, the closest I've come > > is the Hoeffding Bound or Additive Chernoff Bound. Page 2, right-hand > > column has a description of the algorithm: > > http://www.cs.washington.edu/homes/pedrod/papers/kdd00.pdf > > > > Is this the right base math? Given this formula for acceptable errors, > > what would be the algorithm for a similarity measure? > > > > Also, what does a negative correlation value mean? Should I just look > > at the absolute value? > > > > -- > > Lance Norskog > > [email protected] > > >
