What if one was to use Tanimoto distance measure with KMeans.. would the same reasoning apply?

On 6/26/11 1:54 PM, Christoph Brücke wrote:
Hi Mark,

you typically choose a somewhat cheaper distance metric for the canopy 
clustering, if used as a preprocessing step for KMeans. A simple example would 
be Manhattan distance (d = |x1 - x2| + |y1 - x2|) for Canopy clustering and 
Squared Euclidean distance [d = sqrt( (x1 - x2) ^2 + (y1 - y2) ^ 2) )] for 
KMeans. This way you got a cheap approximation for your initial cluster centers.
I hope this was helpful.

Regard,
Christoph


Am 26.06.2011 um 21:29 schrieb Mark:

Should canopy generation and KMeans clustering typically use the same distance 
calculation or is possible to mix and match? Any reason why some would mix?

Thanks

Christoph Brücke
[email protected]



Reply via email to