What if one was to use Tanimoto distance measure with KMeans.. would the
same reasoning apply?
On 6/26/11 1:54 PM, Christoph Brücke wrote:
Hi Mark,
you typically choose a somewhat cheaper distance metric for the canopy
clustering, if used as a preprocessing step for KMeans. A simple example would
be Manhattan distance (d = |x1 - x2| + |y1 - x2|) for Canopy clustering and
Squared Euclidean distance [d = sqrt( (x1 - x2) ^2 + (y1 - y2) ^ 2) )] for
KMeans. This way you got a cheap approximation for your initial cluster centers.
I hope this was helpful.
Regard,
Christoph
Am 26.06.2011 um 21:29 schrieb Mark:
Should canopy generation and KMeans clustering typically use the same distance
calculation or is possible to mix and match? Any reason why some would mix?
Thanks
Christoph Brücke
[email protected]