Sorry for my English, can't express myself too well :-(
Basically I want to do this:
I have some canopy clusters as result of a canopy clustering pass. Now
i want to generate a "centroids" folder containing just the centroids
of these clusters.
Maybe it is too simple for anyone knowledgeable about mahout so it goes
under the radar.
-----Ursprüngliche Mitteilung-----
Von: Ted Dunning <[email protected]>
An: user <[email protected]>
Verschickt: Do, 3 Jan 2013 5:13 pm
Betreff: Re: Seeding k-means with canopy clustering / Filter canopies
On Thu, Jan 3, 2013 at 8:08 AM, Stefan Kreuzer
<[email protected]>wrote:
But even with a small weight (not sure how to apply that) i still
have the
wrong number of centroids, i.e. the wrong k?
I didn't think so. I seem to be confused about what you want.
I imagined something like:
1. Do canopy clustering with clusterFilter param => retrieve a folder
with
x canopy clusters and a folder with x+n canopy centroids, where x
represents a good value for k.
2. Remove centroids that do not correspond with any of the canopy
clusters.
3. Use these reduced set of canopy centroid as seed for k-means.
What about running a single k-means assignment pass where you assign the
x+n canopy centroids to each of the x clusters that you have?