I think you have the right idea about the clusterClassificationThreshold,
but something just isn't working right in your case.

I know this answer won't be particularly helpful since I don't have any
suggestions to fix your problem, but I did a test recently where I tried
clusterClassificationThreshold values of 0.0, 0.1, and 0.5. With 0.0 and
0.1, all my points were clustered. With 0.5, none of them were clustered.
So I assume there is some value for my test data between 0.1 and 0.5 where
I would cluster some but not all of my data.


On Wed, Feb 20, 2013 at 12:07 PM, Chris Harrington <[email protected]>wrote:

> Hi all,
>
> I'm running kmeans to cluster some text docs and some docs that are
> seemingly unrelated to the cluster (i.e. noise) are getting clustered and I
> wish to leave them unclustered.
>
> I thought the clusterClassificationThreshold variable would do this for me
>
> from the java doc
>
> clusterClassificationThreshold
>    *          Is a clustering strictness / outlier removal parameter. Its
> value should be between 0 and 1. Vectors
>    *          having pdf below this value will not be clustered.
>
> but when ever I change this value no clustered points get written and
> there doesn't seem to be any change in the clusters, no matter what value I
> set (tried 0.00001 and 0.99999)
>
> Did I misunderstand what this variable does or am I missing here?

Reply via email to