No good answers here. The T2 value is the one which will control the number of 
clusters that Canopy finds. Try an initial value that seems reasonable then do 
a binary search, halving or doubling the value etc., until you get a reasonable 
number of clusters. Increasing T2 will give you fewer clusters, decreasing will 
give you more. If your initial value is off a lot you will get either 1 or 
numPoints clusters. T1 will affect which points that are near to a cluster but 
farther than T2 will contribute to its ultimate centroid. You can make T1=T2 in 
your binary search then increase T1 incrementally to see how the centroids 
move. 

-----Original Message-----
From: Camilo Lopez [mailto:[email protected]] On Behalf Of Camilo Lopez
Sent: Wednesday, April 27, 2011 12:39 PM
To: [email protected]
Subject: Finding thresholds for canopy

I'm using Canopy as first step for K-means clustering, is there any 
algorithmic, or even a good heuristic to estimate good T1 and T2 from the 
vectorized data?

Reply via email to