Hi Jeff. > Did you turn off most-likely classification?
Yes, I specified most-likely option to false. In general, pdf's range is between 0 and 1. So, if pdf threshold is specified 0, all points classified to all of the clusters. Actually, sequence file is empty. I feel contradiction. I may be wrong but this is bug? Thanks, Yoshihiro. 2012/12/26 Jeff Eastman <[email protected]> > Here's a response to a similar question from a couple of months ago: > > The classification phase of Dirichlet uses a most-likely assignment of > points to clusters by default. This means that, unlike the training phase > where points are assigned statistically to likely clusters, the > classification may result in empty clusters even though those clusters have > nonzero counts in the final iteration. You can disable most-likely > assignment and set a pdf threshold - check the documentation - and points > will be classified to all of the clusters that have pdf greater than the > threshold. > > Does this help? Did you turn off most-likely classification? > Jeff > > > On 12/24/12 11:57 PM, yoshihiro fujimoto wrote: > >> Hi all, >> >> >> https://cwiki.apache.org/**MAHOUT/dirichlet-process-**clustering.html<https://cwiki.apache.org/MAHOUT/dirichlet-process-clustering.html> >> >> According to this page, it can specify threshold to Dirichlet Driver. >> This page explain that threshold of 0 will emit all clusters with their >> associated probabilities for each vector. >> So, I've run Dirichlet Clustering using threshold 0. >> But, clusteredPoints/part-m-00000 sequence file is empty( length is 120 >> byte). >> >> In Dirichlet Process, is there a case of empty result using threshold 0? >> >> Thanks, >> >> Yoshihiro >> >> >
