Thanks Jeff, I guess the "art" part of it is the initial reasonable number.


On 2011-04-27, at 4:23 PM, Jeff Eastman wrote:

> No good answers here. The T2 value is the one which will control the number 
> of clusters that Canopy finds. Try an initial value that seems reasonable 
> then do a binary search, halving or doubling the value etc., until you get a 
> reasonable number of clusters. Increasing T2 will give you fewer clusters, 
> decreasing will give you more. If your initial value is off a lot you will 
> get either 1 or numPoints clusters. T1 will affect which points that are near 
> to a cluster but farther than T2 will contribute to its ultimate centroid. 
> You can make T1=T2 in your binary search then increase T1 incrementally to 
> see how the centroids move. 
> 
> -----Original Message-----
> From: Camilo Lopez [mailto:[email protected]] On Behalf Of Camilo Lopez
> Sent: Wednesday, April 27, 2011 12:39 PM
> To: [email protected]
> Subject: Finding thresholds for canopy
> 
> I'm using Canopy as first step for K-means clustering, is there any 
> algorithmic, or even a good heuristic to estimate good T1 and T2 from the 
> vectorized data?

Reply via email to