Yes, this Hoeffding *bound*, and one flavor of the Chernoff bound, are not
really similarity metrics. They can construct loose bounds on what the true
mean of a random variable must be given some observations.

(These bounds have a use, but in another part of the recommender
computation. And in those parts, you know the distribution of the random
variable so can use a better bound.)

The Hoeffding D statistic (which I had to re-lookup for sure) is what you
want. It is spiritually like the Spearman correlation in that it is based on
relative ordering of ratings rather than rating values. I have never played
with it but it is an interesting possibility. I imagine it will be slow to
compute, but, could be a useful addition. You would probably want to
refactor SpearmanCorrelationSimilarity a little since it already does the
conversion to rank and then subclass to implement this formula if
interested.

Negative correlation just means two series are in fact related, but one goes
up when the other goes down and vice versa, rather than trending together,
which would be positive correlation.

On Sun, Apr 24, 2011 at 4:58 AM, Lance Norskog <[email protected]> wrote:

> This paper compares Pearson, Spearman and Hoeffding's D measure as
> similarity measures for DNA matching. It claims Hoeffding is the best.
> http://www.ncbi.nlm.nih.gov/pubmed/19634197
>
> Chasing down Hoeffding as a similarity measure, the closest I've come
> is the Hoeffding Bound or Additive Chernoff Bound. Page 2, right-hand
> column has a description of the algorithm:
> http://www.cs.washington.edu/homes/pedrod/papers/kdd00.pdf
>
> Is this the right base math? Given this formula for acceptable errors,
> what would be the algorithm for a similarity measure?
>
> Also, what does a negative correlation value mean? Should I just look
> at the absolute value?
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to