The classification phase of Dirichlet uses a most-likely assignment of points to clusters by default. This means that, unlike the training phase where points are assigned statistically to likely clusters, the classification may result in empty clusters even though those clusters have nonzero counts in the final iteration. You can disable most-likely assignment and set a pdf threshold - check the documentation - and points will be classified to all of the clusters that have pdf greater than the threshold.

On 11/28/12 8:53 AM, Christopher Laux wrote:
Hi all,

I've run Dirichlet Clustering but the clustered points output is empty.
Specifically clusteredPoints/part-m-00000 and -00001 exist but both files
are empty Sequence files (length 120 bytes). The clusters (directories
cluster-n) themselves are filled.

Any hints as to what caused this?

Thanks,
Chris


Reply via email to