The knn stuff on github can run with 0.7. You would have to pull a few classes back that have been moved to Mahout, but it shouldn't be hard to do since the names and paths are identical.
I have no good answer for you about using canopy centroids. The normal way of doing this is to put a very small or zero weight on the seed centroids. That means that they start tings going but have very little or no influence later. On Thu, Jan 3, 2013 at 3:43 AM, Stefan Kreuzer <[email protected]>wrote: > I fear I have to stick to 0.7. So there is no solution to get rid of the > superfluous canopy centroids for the k-means seed? > > > -----Ursprüngliche Mitteilung----- > Von: Ted Dunning <[email protected]> > An: user <[email protected]> > Verschickt: Do, 3 Jan 2013 7:01 am > Betreff: Re: Seeding k-means with canopy clustering / Filter canopies > > > Bitlets have come into Mahout so far, but the core is in > https://github.com/tdunning/**knn <https://github.com/tdunning/knn> still. > > The quick summary is that this code can cluster 10-dimensional data at > about 1 million points in 20 seconds on a single machine. It also can > scale out horizontally using a single map-reduce pass maintaining about the > same speed. Performance scales down essentially linearly with higher > dimensionality. > > It works by making a fast, single pass through the data to produce a sketch > of the data. This sketch is clustered in memory using a high quality ball > k-means algorithm. > > The API is currently not compatible with the current clustering API. The > algorithms are being tested for quality by Dan Filimon who is also doing > the scaling work. > > On Wed, Jan 2, 2013 at 6:00 PM, Stefan Kreuzer <[email protected] > >wrote: > > Uhm no... where can I look? Sorry >> >> >> >> >> -----Ursprüngliche Mitteilung----- >> Von: Ted Dunning <[email protected]> >> An: user <[email protected]> >> Verschickt: Do, 3 Jan 2013 2:12 am >> Betreff: Re: Seeding k-means with canopy clustering / Filter canopies >> >> >> Stefan, >> >> Have you looked at the k-means work that Dan Filimon and I are doing? >> >> On Wed, Jan 2, 2013 at 4:46 PM, Stefan Kreuzer <[email protected] >> >wrote: >> >> > I try to seed a k-means clustering with canopy clustering. Problem: >> > Depending on the choice for t1 and t2, canopy clustering gives me >> > too > >> many >> > canopies or just 1. >> > I thought I could solve this with the clusterFilter parameter, but >> > no > >> > luck. Although I can restrict the number of _canopy clusters_ with >> > the > >> > clusterFilter parameter leading to what would be a good value for >> > k, this > >> > parameter has no effect on the _canopy centroids_ that are created, >> > and > >> > these are the seed for k-means. >> > Is there a way to get a seed for k-means that reflects the value >> > given > >> for >> > the clusterFilter parameter in canopy clustering? >> > >> >> >> >> >
