Re: Canopy estimator

Jeff Eastman Thu, 10 May 2012 06:12:55 -0700

No, the issue was discussed but never reached critical mass. I typicallydo a binary search to find the best value setting T1==T2 and then tweakT1 up a bit. For feeding k-means, this latter step is not so important.

If you could figure out a way to automate this we would be interested.Conceptually, using the RandomSeedGenerator to sample a few vectors andcomparing them with your chosen DistanceMeasure would give you a hint atthe T-value to begin the search. A utility to do that would be a usefulcontribution.


On 5/9/12 8:36 PM, Pat Ferrel wrote:

Some thoughts on https://issues.apache.org/jira/browse/MAHOUT-563
Did anything ever get done with this? Ted mentions limited usefulness.This may be true but the cases he mentions as counter examples arealso not very good for using canopy ahead of kmeans, no? That infowould be a useful result. To use canopies I find myself running itover and over trying to see some inflection in the number of clusters.Why not automate this? Even if the data shows nothing, that is itselfan answer of value and it would save a lot of hand work to find outthe same thing.

Re: Canopy estimator

Reply via email to