Some thoughts on https://issues.apache.org/jira/browse/MAHOUT-563

Did anything ever get done with this? Ted mentions limited usefulness. This may be true but the cases he mentions as counter examples are also not very good for using canopy ahead of kmeans, no? That info would be a useful result. To use canopies I find myself running it over and over trying to see some inflection in the number of clusters. Why not automate this? Even if the data shows nothing, that is itself an answer of value and it would save a lot of hand work to find out the same thing.

Reply via email to