Given that ratings have only a small effect on recommendations (i.e. you can
reduce to the binary case with much loss), and given that rank based
statistics don't like tied scores, I wouldn't expect much gain from
Hoeffding's D.

On Sun, Apr 24, 2011 at 1:20 AM, Sean Owen <[email protected]> wrote:

> Yes, this Hoeffding *bound*, and one flavor of the Chernoff bound, are not
> really similarity metrics. They can construct loose bounds on what the true
> mean of a random variable must be given some observations.
>
> (These bounds have a use, but in another part of the recommender
> computation. And in those parts, you know the distribution of the random
> variable so can use a better bound.)
>
> The Hoeffding D statistic (which I had to re-lookup for sure) is what you
> want. It is spiritually like the Spearman correlation in that it is based
> on
> relative ordering of ratings rather than rating values. I have never played
> with it but it is an interesting possibility. I imagine it will be slow to
> compute, but, could be a useful addition. You would probably want to
> refactor SpearmanCorrelationSimilarity a little since it already does the
> conversion to rank and then subclass to implement this formula if
> interested.
>
> Negative correlation just means two series are in fact related, but one
> goes
> up when the other goes down and vice versa, rather than trending together,
> which would be positive correlation.
>
> On Sun, Apr 24, 2011 at 4:58 AM, Lance Norskog <[email protected]> wrote:
>
> > This paper compares Pearson, Spearman and Hoeffding's D measure as
> > similarity measures for DNA matching. It claims Hoeffding is the best.
> > http://www.ncbi.nlm.nih.gov/pubmed/19634197
> >
> > Chasing down Hoeffding as a similarity measure, the closest I've come
> > is the Hoeffding Bound or Additive Chernoff Bound. Page 2, right-hand
> > column has a description of the algorithm:
> > http://www.cs.washington.edu/homes/pedrod/papers/kdd00.pdf
> >
> > Is this the right base math? Given this formula for acceptable errors,
> > what would be the algorithm for a similarity measure?
> >
> > Also, what does a negative correlation value mean? Should I just look
> > at the absolute value?
> >
> > --
> > Lance Norskog
> > [email protected]
> >
>

Reply via email to