Thanks Jeff, I guess the "art" part of it is the initial reasonable number.
On 2011-04-27, at 4:23 PM, Jeff Eastman wrote: > No good answers here. The T2 value is the one which will control the number > of clusters that Canopy finds. Try an initial value that seems reasonable > then do a binary search, halving or doubling the value etc., until you get a > reasonable number of clusters. Increasing T2 will give you fewer clusters, > decreasing will give you more. If your initial value is off a lot you will > get either 1 or numPoints clusters. T1 will affect which points that are near > to a cluster but farther than T2 will contribute to its ultimate centroid. > You can make T1=T2 in your binary search then increase T1 incrementally to > see how the centroids move. > > -----Original Message----- > From: Camilo Lopez [mailto:[email protected]] On Behalf Of Camilo Lopez > Sent: Wednesday, April 27, 2011 12:39 PM > To: [email protected] > Subject: Finding thresholds for canopy > > I'm using Canopy as first step for K-means clustering, is there any > algorithmic, or even a good heuristic to estimate good T1 and T2 from the > vectorized data?
