Re: Judging the quality of clustering

Pat Ferrel Thu, 17 May 2012 12:54:01 -0700

I built a tool that iterates through a list of values for k on the samedata and spits out the CDbw and ClusterEvaluator results each time.

When the evaluator or CDbw prunes a cluster, how do I interpret that?They seem to throw out the same clusters on a given run. Also CDbwalways returns an inter-cluster density of 0?


On 5/17/12 5:58 AM, Jeff Eastman wrote:

Yes, that is the paper I used to implement CDbw. I've tried it a fewtimes along with the simpler ClusterEvaluator metrics I took fromMahout In Action and they look to be reasonable - see the tests -though I have no way to judge their absolute values. Anything you cancontribute in this area would be most welcome. Perhaps a wiki page?
On 5/16/12 1:14 PM, Pat Ferrel wrote:
The reference was in the code forhttp://www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf
On 5/16/12 9:56 AM, Pat Ferrel wrote:
Thanks, I've been looking at that. Is there a description of how tointerpret those values? An academic paper maybe? The intra-clusterdistance intuitively seems to correspond to something like cohesion.I don't get the intuition behind inter-cluster distances but Tedthinks they are the most important.
On 5/16/12 7:32 AM, Jeff Eastman wrote:
Mahout has a ClusterEvaluator and a CDbwEvaluator that compute somequality metrics (inter-cluster distance, intra-cluster-distance,...) that you may find useful. Both calculate a set ofrepresentative points from the clustering output and compute the(n^2) metrics over these points rather than all of the points ineach cluster.
On 5/15/12 4:46 PM, Pat Ferrel wrote:
So many questions about best k, how to choose t1 and t2, how muchhelp is dimensional reduction would have clear answers if we had away to judge the quality of clusters.
Various methods were discussed here for a time:http://www.lucidimagination.com/search/document/dab8c1f3c3addcfe/validating_clustering_output
Has there been any work on building a measure of quality?

Re: Judging the quality of clustering

Reply via email to