Hi Mark, you typically choose a somewhat cheaper distance metric for the canopy clustering, if used as a preprocessing step for KMeans. A simple example would be Manhattan distance (d = |x1 - x2| + |y1 - x2|) for Canopy clustering and Squared Euclidean distance [d = sqrt( (x1 - x2) ^2 + (y1 - y2) ^ 2) )] for KMeans. This way you got a cheap approximation for your initial cluster centers. I hope this was helpful.
Regard, Christoph Am 26.06.2011 um 21:29 schrieb Mark: > Should canopy generation and KMeans clustering typically use the same > distance calculation or is possible to mix and match? Any reason why some > would mix? > > Thanks > Christoph Brücke [email protected]
