My own feeling is that the right answer is to look at average squared
distance on your training data and on held out data.
As long as these values are nearly the same, you likely have a smaller (or
equal) than optimal value of k. When the average squared distance is
significantly less on the
Hello,
I have some questions around large-scale clustering. I would like to
arrive at a methodology that I can use to determine an appropriate
value of K to run K-means clustering for (at least for my scenario, if
not in general). More details follow below (apologies for the
verbosity, but I