No good answers here. The T2 value is the one which will control the number of clusters that Canopy finds. Try an initial value that seems reasonable then do a binary search, halving or doubling the value etc., until you get a reasonable number of clusters. Increasing T2 will give you fewer clusters, decreasing will give you more. If your initial value is off a lot you will get either 1 or numPoints clusters. T1 will affect which points that are near to a cluster but farther than T2 will contribute to its ultimate centroid. You can make T1=T2 in your binary search then increase T1 incrementally to see how the centroids move.
-----Original Message----- From: Camilo Lopez [mailto:[email protected]] On Behalf Of Camilo Lopez Sent: Wednesday, April 27, 2011 12:39 PM To: [email protected] Subject: Finding thresholds for canopy I'm using Canopy as first step for K-means clustering, is there any algorithmic, or even a good heuristic to estimate good T1 and T2 from the vectorized data?
