Re: Cluster Evaluation 0.8 style

Ted Dunning Sun, 08 Jul 2012 23:39:53 -0700

What do you mean by self similarity?  Power law size scaling?  Or that two 
successive clusterings get nearly the same answer?


Sent from my iPhone

On Jul 8, 2012, at 8:40 PM, Lance Norskog <[email protected]> wrote:

> Are there any measures of self-similarity?
> 
> On Sun, Jul 8, 2012 at 6:07 PM, Ted Dunning <[email protected]> wrote:
> 
>> I can't comment on the existing evaluators, but for me the only real
>> measure that I care about is average distance to nearest cluster for new or
>> held-out data.  I will be building something of this sort for the
>> clustering part of the knn code I have been working on.
>> 
>> 
>> On Sun, Jul 8, 2012 at 5:44 PM, Pat Ferrel <[email protected]> wrote:
>> 
>>> To use something like kmeans on any large and changing data set it seems
>>> a requirement that there be some means of evaluating the quality of
>>> clusters at different scales. The usual eyeballing breaks down quickly.
>>> 
>>> Trying to use the cluster evaluators in Mahout with kmeans as the
>>> clustering method and cosine and the distance measure has proven
>>> problematic. The method is to iterate through the data using different ks
>>> and performing the evaluation at each point. What I find is that certain
>>> values are almost always in error. The Intra-cluster density from
>>> ClusterEvaluator is almost always NaN. The CDbw  inter-cluster density is
>>> almost always 0. I have also seen several cases where CDbw fails to return
>>> any results but have not tracked down why yet.
>>> 
>>> Given that the data for either evaluator is usually incomplete these
>>> methods are not very useful. Is mahout dropping the evaluators? Is the
>>> general wisdom that they are not particularly useful? Should a newer method
>>> be pursued? This seems a fairly important question to me, am I missing
>>> something?
>>> 
>>> Raw data for a sample crawl is given below:
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> Lance Norskog
> [email protected]

Re: Cluster Evaluation 0.8 style

Reply via email to