Re: Finding thresholds for canopy

Paul Mahon Wed, 27 Apr 2011 14:13:24 -0700

It certainly is 3d thinking. The idea generalized to N dimensions, butI agree, it's unlikely to be effective since most of the space athigher dimensions is empty in any application I've seen.


On 04/27/2011 02:04 PM, Ted Dunning wrote:

That sounds like 3-dimensional thinking.


High dimensional problems abound and have very different properties.

On Wed, Apr 27, 2011 at 1:55 PM, Paul Mahon<[email protected]>  wrote:

No, I mean the area. If all the vectors fit in a AxBxC sized box, and you
expect about 10 clusters, you could make an initial guess that the clusters
will be (A/10)xBxC in size and you could try T1=(A/10)*B*C. I've no idea how
well this would work in practice... probably not very well.

On 04/27/2011 01:50 PM, Camilo Lopez wrote:

By area of the space you mean just the total number of vectors I'm using?
On 2011-04-27, at 4:46 PM, Paul Mahon wrote:

  If you have a guess at how many clusters you want you could take the

total area of the space and divide by the number of clusters to get an
initial guess of T2 or T1. That might work to get you started, depending on
the distribution.

On 04/27/2011 12:39 PM, Camilo Lopez wrote:

I'm using Canopy as first step for K-means clustering, is there any
algorithmic, or even a good heuristic to estimate good T1 and T2 from the
vectorized data?

Re: Finding thresholds for canopy

Reply via email to