I think you have the right idea about the clusterClassificationThreshold, but something just isn't working right in your case.
I know this answer won't be particularly helpful since I don't have any suggestions to fix your problem, but I did a test recently where I tried clusterClassificationThreshold values of 0.0, 0.1, and 0.5. With 0.0 and 0.1, all my points were clustered. With 0.5, none of them were clustered. So I assume there is some value for my test data between 0.1 and 0.5 where I would cluster some but not all of my data. On Wed, Feb 20, 2013 at 12:07 PM, Chris Harrington <[email protected]>wrote: > Hi all, > > I'm running kmeans to cluster some text docs and some docs that are > seemingly unrelated to the cluster (i.e. noise) are getting clustered and I > wish to leave them unclustered. > > I thought the clusterClassificationThreshold variable would do this for me > > from the java doc > > clusterClassificationThreshold > * Is a clustering strictness / outlier removal parameter. Its > value should be between 0 and 1. Vectors > * having pdf below this value will not be clustered. > > but when ever I change this value no clustered points get written and > there doesn't seem to be any change in the clusters, no matter what value I > set (tried 0.00001 and 0.99999) > > Did I misunderstand what this variable does or am I missing here?
