This paper compares Pearson, Spearman and Hoeffding's D measure as similarity measures for DNA matching. It claims Hoeffding is the best. http://www.ncbi.nlm.nih.gov/pubmed/19634197
Chasing down Hoeffding as a similarity measure, the closest I've come is the Hoeffding Bound or Additive Chernoff Bound. Page 2, right-hand column has a description of the algorithm: http://www.cs.washington.edu/homes/pedrod/papers/kdd00.pdf Is this the right base math? Given this formula for acceptable errors, what would be the algorithm for a similarity measure? Also, what does a negative correlation value mean? Should I just look at the absolute value? -- Lance Norskog [email protected]
