Thanks, I've been looking at that. Is there a description of how to
interpret those values? An academic paper maybe? The intra-cluster
distance intuitively seems to correspond to something like cohesion. I
don't get the intuition behind inter-cluster distances but Ted thinks
they are the most important.
On 5/16/12 7:32 AM, Jeff Eastman wrote:
Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some
quality metrics (inter-cluster distance, intra-cluster-distance, ...)
that you may find useful. Both calculate a set of representative
points from the clustering output and compute the (n^2) metrics over
these points rather than all of the points in each cluster.
On 5/15/12 4:46 PM, Pat Ferrel wrote:
So many questions about best k, how to choose t1 and t2, how much
help is dimensional reduction would have clear answers if we had a
way to judge the quality of clusters.
Various methods were discussed here for a time:
http://www.lucidimagination.com/search/document/dab8c1f3c3addcfe/validating_clustering_output
Has there been any work on building a measure of quality?