What do you mean by self similarity? Power law size scaling? Or that two successive clusterings get nearly the same answer?
Sent from my iPhone On Jul 8, 2012, at 8:40 PM, Lance Norskog <[email protected]> wrote: > Are there any measures of self-similarity? > > On Sun, Jul 8, 2012 at 6:07 PM, Ted Dunning <[email protected]> wrote: > >> I can't comment on the existing evaluators, but for me the only real >> measure that I care about is average distance to nearest cluster for new or >> held-out data. I will be building something of this sort for the >> clustering part of the knn code I have been working on. >> >> >> On Sun, Jul 8, 2012 at 5:44 PM, Pat Ferrel <[email protected]> wrote: >> >>> To use something like kmeans on any large and changing data set it seems >>> a requirement that there be some means of evaluating the quality of >>> clusters at different scales. The usual eyeballing breaks down quickly. >>> >>> Trying to use the cluster evaluators in Mahout with kmeans as the >>> clustering method and cosine and the distance measure has proven >>> problematic. The method is to iterate through the data using different ks >>> and performing the evaluation at each point. What I find is that certain >>> values are almost always in error. The Intra-cluster density from >>> ClusterEvaluator is almost always NaN. The CDbw inter-cluster density is >>> almost always 0. I have also seen several cases where CDbw fails to return >>> any results but have not tracked down why yet. >>> >>> Given that the data for either evaluator is usually incomplete these >>> methods are not very useful. Is mahout dropping the evaluators? Is the >>> general wisdom that they are not particularly useful? Should a newer method >>> be pursued? This seems a fairly important question to me, am I missing >>> something? >>> >>> Raw data for a sample crawl is given below: >>> >>> >>> >>> >> > > > -- > Lance Norskog > [email protected]
