How high is the dimension? How is your data generated?
On Wed, Feb 27, 2013 at 1:38 PM, Matt Molek <[email protected]> wrote: > I made a small modification to the KMeansDriver to call the > ClusterClassificationDriver with an emitMostLikely value of false so that I > could see what the pdf values of my points were for all k of my clusters. > > I was expecting the most likely cluster to have a much higher pdf than the > other clusters in most cases, but in my results, all the values are pretty > close to 1/(number of clusters) > > For example, when I ran with 50 clusters, most of my points had a pdf value > of 0.02xx for nearly every cluster. > > I understand that to mean that for most of my points, none of my clusters > are a good fit. Is that right? Or is it common for for the most likely > cluster to only deviate tiny bit from all the others? (I wouldn't think so) > > Thanks for the advice, > Matt >
