Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some
quality metrics (inter-cluster distance, intra-cluster-distance, ...)
that you may find useful. Both calculate a set of representative points
from the clustering output and compute the (n^2) metrics over these
points rather than all of the points in each cluster.
On 5/15/12 4:46 PM, Pat Ferrel wrote:
So many questions about best k, how to choose t1 and t2, how much help
is dimensional reduction would have clear answers if we had a way to
judge the quality of clusters.
Various methods were discussed here for a time:
http://www.lucidimagination.com/search/document/dab8c1f3c3addcfe/validating_clustering_output
Has there been any work on building a measure of quality?