Here's a response to a similar question from a couple of months ago:

The classification phase of Dirichlet uses a most-likely assignment of points to clusters by default. This means that, unlike the training phase where points are assigned statistically to likely clusters, the classification may result in empty clusters even though those clusters have nonzero counts in the final iteration. You can disable most-likely assignment and set a pdf threshold - check the documentation - and points will be classified to all of the clusters that have pdf greater than the threshold.

Does this help? Did you turn off most-likely classification?
Jeff


On 12/24/12 11:57 PM, yoshihiro fujimoto wrote:
Hi all,


https://cwiki.apache.org/MAHOUT/dirichlet-process-clustering.html

According to this page, it can specify threshold to Dirichlet Driver.
This page explain that threshold of 0 will emit all clusters with their
associated probabilities for each vector.
So, I've run Dirichlet Clustering using threshold 0.
But, clusteredPoints/part-m-00000 sequence file is empty( length is 120
byte).

In Dirichlet Process, is there a case of empty result using threshold 0?

Thanks,

Yoshihiro


Reply via email to