Yup. The Hoeffding D statistic requires two passes of the preferences vector pair: one to pull the mean of each input vector, the next to calculate against it.
I'm using these correlations for somewhat different uses (evaluating sampler outputs) and coded a commons-math Correlator. It seems to have its own personality but still gives useful results. By which I mean, each input pair gives a similar output from Spearman, Pearsons and Hoeffding D. For example, input pair X gives (1.2,1.1,1.3) and input pair Y gives (3.5,3.3,3.9). So, it is not interesting for user/item recommendation, but it does have its own utility in other problems. On Sun, Apr 24, 2011 at 10:39 AM, Ted Dunning <[email protected]> wrote: > Given that ratings have only a small effect on recommendations (i.e. you can > reduce to the binary case with much loss), and given that rank based > statistics don't like tied scores, I wouldn't expect much gain from > Hoeffding's D. > > On Sun, Apr 24, 2011 at 1:20 AM, Sean Owen <[email protected]> wrote: > >> Yes, this Hoeffding *bound*, and one flavor of the Chernoff bound, are not >> really similarity metrics. They can construct loose bounds on what the true >> mean of a random variable must be given some observations. >> >> (These bounds have a use, but in another part of the recommender >> computation. And in those parts, you know the distribution of the random >> variable so can use a better bound.) >> >> The Hoeffding D statistic (which I had to re-lookup for sure) is what you >> want. It is spiritually like the Spearman correlation in that it is based >> on >> relative ordering of ratings rather than rating values. I have never played >> with it but it is an interesting possibility. I imagine it will be slow to >> compute, but, could be a useful addition. You would probably want to >> refactor SpearmanCorrelationSimilarity a little since it already does the >> conversion to rank and then subclass to implement this formula if >> interested. >> >> Negative correlation just means two series are in fact related, but one >> goes >> up when the other goes down and vice versa, rather than trending together, >> which would be positive correlation. >> >> On Sun, Apr 24, 2011 at 4:58 AM, Lance Norskog <[email protected]> wrote: >> >> > This paper compares Pearson, Spearman and Hoeffding's D measure as >> > similarity measures for DNA matching. It claims Hoeffding is the best. >> > http://www.ncbi.nlm.nih.gov/pubmed/19634197 >> > >> > Chasing down Hoeffding as a similarity measure, the closest I've come >> > is the Hoeffding Bound or Additive Chernoff Bound. Page 2, right-hand >> > column has a description of the algorithm: >> > http://www.cs.washington.edu/homes/pedrod/papers/kdd00.pdf >> > >> > Is this the right base math? Given this formula for acceptable errors, >> > what would be the algorithm for a similarity measure? >> > >> > Also, what does a negative correlation value mean? Should I just look >> > at the absolute value? >> > >> > -- >> > Lance Norskog >> > [email protected] >> > >> > -- Lance Norskog [email protected]
