Hi,
usually, regarding the input data, there should be more than just one cluster.
You may use the cluster dumper utility to output the cluster data.
(https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper)
It seems that your t1 and t2 threshold for the canopies are chosen to large, so
that all data points are assigned to just one canopy. Could you describe your
input data (number of dimensions, range, distribution, ...) and give the
parameters you used for the clustering?
Regards,
Christoph
Am 27.06.2011 um 00:40 schrieb Mark:
> Is there an easy way to know hot many canopies where generated after running
> the canopy generation tool?
>
> I tried viewing the file clusters-0/part-r-00000 via seqdumper but it always
> returns:
>
> Key: C-0: Value: C-0:
> {437:0.005630003188145648,478:0.006034746778989781,591:0.020761514762446885...
> Count: 1
>
> Should there be multiple key value pairs or just this one?
>
> Thanks
>
>