Hi Jeff,

After building this distance matrix, what would then be a good value
for T2? The average distance in the matrix?

Frank

On Wed, Apr 27, 2011 at 10:57 PM, Jeff Eastman <[email protected]> wrote:
> Worth a try, but it ultimately boils down to the distance measure you've 
> chosen, the distributions of input vectors and T2. As a pre-run experiment, 
> you could sample some points from your data set (e.g. using 
> RandomSeedGenerator as you would to prime k-means), then build a distance 
> matrix using your chosen distance measure. That would give you a T2 starting 
> point in a more systematic manner than grabbing it completely out of thin air.
>
> -----Original Message-----
> From: Paul Mahon [mailto:[email protected]]
> Sent: Wednesday, April 27, 2011 1:46 PM
> To: [email protected]
> Subject: Re: Finding thresholds for canopy
>
> If you have a guess at how many clusters you want you could take the
> total area of the space and divide by the number of clusters to get an
> initial guess of T2 or T1. That might work to get you started,
> depending on the distribution.
>
> On 04/27/2011 12:39 PM, Camilo Lopez wrote:
>> I'm using Canopy as first step for K-means clustering, is there any 
>> algorithmic, or even a good heuristic to estimate good T1 and T2 from the 
>> vectorized data?
>

Reply via email to