Re: Canopy Generation

Christoph Brücke Mon, 27 Jun 2011 02:13:13 -0700

Hi,

usually, regarding the input data, there should be more than just one cluster. 
You may use the cluster dumper utility to output the cluster data.  
(https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper)


It seems that your t1 and t2 threshold for the canopies are chosen to large, so 
that all data points are assigned to just one canopy. Could you describe your 
input data (number of dimensions, range, distribution, ...) and give the 
parameters you used for the clustering?

Regards,
Christoph

Am 27.06.2011 um 00:40 schrieb Mark:

> Is there an easy way to know hot many canopies where generated after running 
> the canopy generation tool?
> 
> I tried viewing the file clusters-0/part-r-00000 via seqdumper but it always 
> returns:
> 
> Key: C-0: Value: C-0: 
> {437:0.005630003188145648,478:0.006034746778989781,591:0.020761514762446885...
> Count: 1
> 
> Should there be multiple key value pairs or just this one?
> 
> Thanks
> 
>

Re: Canopy Generation

Reply via email to