Hi Jeff, After building this distance matrix, what would then be a good value for T2? The average distance in the matrix?
Frank On Wed, Apr 27, 2011 at 10:57 PM, Jeff Eastman <[email protected]> wrote: > Worth a try, but it ultimately boils down to the distance measure you've > chosen, the distributions of input vectors and T2. As a pre-run experiment, > you could sample some points from your data set (e.g. using > RandomSeedGenerator as you would to prime k-means), then build a distance > matrix using your chosen distance measure. That would give you a T2 starting > point in a more systematic manner than grabbing it completely out of thin air. > > -----Original Message----- > From: Paul Mahon [mailto:[email protected]] > Sent: Wednesday, April 27, 2011 1:46 PM > To: [email protected] > Subject: Re: Finding thresholds for canopy > > If you have a guess at how many clusters you want you could take the > total area of the space and divide by the number of clusters to get an > initial guess of T2 or T1. That might work to get you started, > depending on the distribution. > > On 04/27/2011 12:39 PM, Camilo Lopez wrote: >> I'm using Canopy as first step for K-means clustering, is there any >> algorithmic, or even a good heuristic to estimate good T1 and T2 from the >> vectorized data? >
