Re: Seeding k-means with canopy clustering / Filter canopies

Ted Dunning Thu, 03 Jan 2013 07:41:08 -0800

The knn stuff on github can run with 0.7.  You would have to pull a few
classes back that have been moved to Mahout, but it shouldn't be hard to do
since the names and paths are identical.


I have no good answer for you about using canopy centroids.  The normal way
of doing this is to put a very small or zero weight on the seed centroids.
 That means that they start tings going but have very little or no
influence later.

On Thu, Jan 3, 2013 at 3:43 AM, Stefan Kreuzer <[email protected]>wrote:

> I fear I have to stick to 0.7. So there is no solution to get rid of the
> superfluous canopy centroids for the k-means seed?
>
>
> -----Ursprüngliche Mitteilung-----
> Von: Ted Dunning <[email protected]>
> An: user <[email protected]>
> Verschickt: Do, 3 Jan 2013 7:01 am
> Betreff: Re: Seeding k-means with canopy clustering / Filter canopies
>
>
> Bitlets have come into Mahout so far, but the core is in
> https://github.com/tdunning/**knn <https://github.com/tdunning/knn> still.
>
> The quick summary is that this code can cluster 10-dimensional data at
> about 1 million points in 20 seconds on a single machine.  It also can
> scale out horizontally using a single map-reduce pass maintaining about the
> same speed.  Performance scales down essentially linearly with higher
> dimensionality.
>
> It works by making a fast, single pass through the data to produce a sketch
> of the data.  This sketch is clustered in memory using a high quality ball
> k-means algorithm.
>
> The API is currently not compatible with the current clustering API.  The
> algorithms are being tested for quality by Dan Filimon who is also doing
> the scaling work.
>
> On Wed, Jan 2, 2013 at 6:00 PM, Stefan Kreuzer <[email protected]
> >wrote:
>
>  Uhm no... where can I look? Sorry
>>
>>
>>
>>
>> -----Ursprüngliche Mitteilung-----
>> Von: Ted Dunning <[email protected]>
>> An: user <[email protected]>
>> Verschickt: Do, 3 Jan 2013 2:12 am
>> Betreff: Re: Seeding k-means with canopy clustering / Filter canopies
>>
>>
>> Stefan,
>>
>> Have you looked at the k-means work that Dan Filimon and I are doing?
>>
>> On Wed, Jan 2, 2013 at 4:46 PM, Stefan Kreuzer <[email protected]
>> >wrote:
>>
>> > I try to seed a k-means clustering with canopy clustering. Problem:
>> > Depending on the choice for t1 and t2, canopy clustering gives me
>>
> too
>
>> many
>> > canopies or just 1.
>> > I thought I could solve this with the clusterFilter parameter, but
>>
> no
>
>> > luck. Although I can restrict the number of _canopy clusters_ with
>>
> the
>
>> > clusterFilter parameter leading to what would be a good value for
>>
> k, this
>
>> > parameter has no effect on the _canopy centroids_ that are created,
>>
> and
>
>> > these are the seed for k-means.
>> > Is there a way to get a seed for k-means that reflects the value
>>
> given
>
>> for
>> > the clusterFilter parameter in canopy clustering?
>> >
>>
>>
>>
>>
>

Re: Seeding k-means with canopy clustering / Filter canopies

Reply via email to