How high is the dimension?

How is your data generated?



On Wed, Feb 27, 2013 at 1:38 PM, Matt Molek <[email protected]> wrote:

> I made a small modification to the KMeansDriver to call the
> ClusterClassificationDriver with an emitMostLikely value of false so that I
> could see what the pdf values of my points were for all k of my clusters.
>
> I was expecting the most likely cluster to have a much higher pdf than the
> other clusters in most cases, but in my results, all the values are pretty
> close to 1/(number of clusters)
>
> For example, when I ran with 50 clusters, most of my points had a pdf value
> of 0.02xx for nearly every cluster.
>
> I understand that to mean that for most of my points, none of my clusters
> are a good fit. Is that right? Or is it common for for the most likely
> cluster to only deviate tiny bit from all the others? (I wouldn't think so)
>
> Thanks for the advice,
> Matt
>

Reply via email to