If the intra cluster distance is small ( which means the vectors are tightly clustered ), then you might not need a lot of iterations to represent it. Similarly, if there are very few vectors per cluster, and the intra cluster distance is also small, then even a single iteration would be fine. Thats how I see it.
On Thu, Nov 1, 2012 at 9:12 PM, Rahul Mishra <[email protected]>wrote: > Thanks for the prompt reply Paritosh. > Could you please explain it a bit further? How does it depend? > > Thanks & Regards, > Rahul > > > On Thu, Nov 1, 2012 at 8:44 PM, paritosh ranjan > <[email protected]>wrote: > > > Each iteration will add a single point to the evolving list of > > representative points for each cluster. > > So, I think it depends on the number of vectors per cluster and also the > > intra cluster distance. > > > > On Thu, Nov 1, 2012 at 8:13 PM, Rahul Mishra <[email protected] > > >wrote: > > > > > Hello Friends, > > > > > > Whats the heuristic for providing what number of iterations for > > > RepresentativePointsDriver? > > > > > > I have run kmeans and fuzzy-kmeans algorithm on a dataset of size > 500MB. > > > Now, how do I obtain cluster quality? > > > > > > Does the following look Okay? : > > > RepresentativePointsDriver.run(conf, new Path(clustersIn), new > > > Path(clusteredPointsIn), new Path(outputDir), new > > > EuclideanDistanceMeasure(), numIterations, runSequential); > > > double interDis = clusterEval.interClusterDensity(); > > > double intraDis = clusterEval.intraClusterDensity(); > > > System.out.println("cluster evaluator: The inter distance: "+interDis); > > > System.out.println("cluster evaluator: The intra distance: "+intraDis); > > > > > > > > > > > > -- > > > Regards, > > > Rahul K Mishra, > > > https://sites.google.com/site/reachrahulkmishra/ > > > > > > > > > -- > Regards, > Rahul K Mishra, > https://sites.google.com/site/reachrahulkmishra/ >
