These numbers are hard to interpret without context.

It is relatively easier to interpret average squared distance within
clusters and between clusters.  Do you have those values?

On Fri, Mar 1, 2013 at 9:38 AM, Matt Molek <[email protected]> wrote:

> My data is from ~4500 Wikipedia articles. I stripped out the wiki markup,
> ran them through seq2sparse, and then reduced to 100 dimensions with ssvd
> before running kmeans.
>
> I re-ran my test with some slightly tweaked parameters to see if I could
> improve the clustering. My pdf values for the most likely clusters improved
> a little bit, but not dramatically.
>
> Taking the most likely cluster's pdf value for each point, I got a minimum
> pdf of 0.0215, a maximum pdf of 0.0377, and a mean pdf value of 0.0282
>
> Looking at all 50 pdf values for each point, I got a minimum pdf of
> 0.0.0174, and a mean pdf value of 0.0200.
>
> Do these pdf values say anything about the fit or quality of my cluster
> results?
>
>
> On Fri, Mar 1, 2013 at 2:56 AM, Ted Dunning <[email protected]> wrote:
>
> > How high is the dimension?
> >
> > How is your data generated?
> >
> >
> >
> > On Wed, Feb 27, 2013 at 1:38 PM, Matt Molek <[email protected]> wrote:
> >
> > > I made a small modification to the KMeansDriver to call the
> > > ClusterClassificationDriver with an emitMostLikely value of false so
> > that I
> > > could see what the pdf values of my points were for all k of my
> clusters.
> > >
> > > I was expecting the most likely cluster to have a much higher pdf than
> > the
> > > other clusters in most cases, but in my results, all the values are
> > pretty
> > > close to 1/(number of clusters)
> > >
> > > For example, when I ran with 50 clusters, most of my points had a pdf
> > value
> > > of 0.02xx for nearly every cluster.
> > >
> > > I understand that to mean that for most of my points, none of my
> clusters
> > > are a good fit. Is that right? Or is it common for for the most likely
> > > cluster to only deviate tiny bit from all the others? (I wouldn't think
> > so)
> > >
> > > Thanks for the advice,
> > > Matt
> > >
> >
>

Reply via email to