Thanks, I've been looking at that. Is there a description of how to interpret those values? An academic paper maybe? The intra-cluster distance intuitively seems to correspond to something like cohesion. I don't get the intuition behind inter-cluster distances but Ted thinks they are the most important.

On 5/16/12 7:32 AM, Jeff Eastman wrote:
Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some quality metrics (inter-cluster distance, intra-cluster-distance, ...) that you may find useful. Both calculate a set of representative points from the clustering output and compute the (n^2) metrics over these points rather than all of the points in each cluster.

On 5/15/12 4:46 PM, Pat Ferrel wrote:
So many questions about best k, how to choose t1 and t2, how much help is dimensional reduction would have clear answers if we had a way to judge the quality of clusters.

Various methods were discussed here for a time: http://www.lucidimagination.com/search/document/dab8c1f3c3addcfe/validating_clustering_output

Has there been any work on building a measure of quality?



Reply via email to