Sorry, I'm not following this shorthand. Are you asking if the term weights of each centroid follow a power law, like they are supposed to?

On 7/9/12 12:34 AM, Lance Norskog wrote:
Power law size scaling.

On Sun, Jul 8, 2012 at 11:39 PM, Ted Dunning <[email protected]> wrote:
What do you mean by self similarity?  Power law size scaling?  Or that two 
successive clusterings get nearly the same answer?

Sent from my iPhone

On Jul 8, 2012, at 8:40 PM, Lance Norskog <[email protected]> wrote:

Are there any measures of self-similarity?

On Sun, Jul 8, 2012 at 6:07 PM, Ted Dunning <[email protected]> wrote:

I can't comment on the existing evaluators, but for me the only real
measure that I care about is average distance to nearest cluster for new or
held-out data.  I will be building something of this sort for the
clustering part of the knn code I have been working on.


On Sun, Jul 8, 2012 at 5:44 PM, Pat Ferrel <[email protected]> wrote:

To use something like kmeans on any large and changing data set it seems
a requirement that there be some means of evaluating the quality of
clusters at different scales. The usual eyeballing breaks down quickly.

Trying to use the cluster evaluators in Mahout with kmeans as the
clustering method and cosine and the distance measure has proven
problematic. The method is to iterate through the data using different ks
and performing the evaluation at each point. What I find is that certain
values are almost always in error. The Intra-cluster density from
ClusterEvaluator is almost always NaN. The CDbw  inter-cluster density is
almost always 0. I have also seen several cases where CDbw fails to return
any results but have not tracked down why yet.

Given that the data for either evaluator is usually incomplete these
methods are not very useful. Is mahout dropping the evaluators? Is the
general wisdom that they are not particularly useful? Should a newer method
be pursued? This seems a fairly important question to me, am I missing
something?

Raw data for a sample crawl is given below:





--
Lance Norskog
[email protected]




Reply via email to