Power law size scaling. On Sun, Jul 8, 2012 at 11:39 PM, Ted Dunning <[email protected]> wrote: > What do you mean by self similarity? Power law size scaling? Or that two > successive clusterings get nearly the same answer? > > Sent from my iPhone > > On Jul 8, 2012, at 8:40 PM, Lance Norskog <[email protected]> wrote: > >> Are there any measures of self-similarity? >> >> On Sun, Jul 8, 2012 at 6:07 PM, Ted Dunning <[email protected]> wrote: >> >>> I can't comment on the existing evaluators, but for me the only real >>> measure that I care about is average distance to nearest cluster for new or >>> held-out data. I will be building something of this sort for the >>> clustering part of the knn code I have been working on. >>> >>> >>> On Sun, Jul 8, 2012 at 5:44 PM, Pat Ferrel <[email protected]> wrote: >>> >>>> To use something like kmeans on any large and changing data set it seems >>>> a requirement that there be some means of evaluating the quality of >>>> clusters at different scales. The usual eyeballing breaks down quickly. >>>> >>>> Trying to use the cluster evaluators in Mahout with kmeans as the >>>> clustering method and cosine and the distance measure has proven >>>> problematic. The method is to iterate through the data using different ks >>>> and performing the evaluation at each point. What I find is that certain >>>> values are almost always in error. The Intra-cluster density from >>>> ClusterEvaluator is almost always NaN. The CDbw inter-cluster density is >>>> almost always 0. I have also seen several cases where CDbw fails to return >>>> any results but have not tracked down why yet. >>>> >>>> Given that the data for either evaluator is usually incomplete these >>>> methods are not very useful. Is mahout dropping the evaluators? Is the >>>> general wisdom that they are not particularly useful? Should a newer method >>>> be pursued? This seems a fairly important question to me, am I missing >>>> something? >>>> >>>> Raw data for a sample crawl is given below: >>>> >>>> >>>> >>>> >>> >> >> >> -- >> Lance Norskog >> [email protected]
-- Lance Norskog [email protected]
