Both H1 and H2 seek to balance inter and intra cluster similarity
in arriving at their evaluation of a cluster solution. For both
criterion functions the goal is to maximize their value.

H1 = I1/E1

H2 = I2/E1

So, both are relying on E1 to measure inter clustering similarity,
which consists of minimizing the cosine between the angle of the
cluster centroids and the overall centroid (which will result in the
greatest angles between the centroids). H1 and H2 differ in how they
measure intra cluster similarity - H1 is relying on I1 (the ball of
string) while H2 relies on I2 (the flower).

Put another way, H1 seeks to maximize the pairwise similarities between
the contexts in each cluster (I1) while minimizing the cosine between
the centroids of the clusters and the overall collection centroid (E1).

H2 seeks to maximize the pairwise similarities between the centroids of
each cluster and the contexts therein (I2), while minimizing the cosine
between the centroids of the clusters and the overall collection centroid
(E1).

In both cases (H1 and H2) the goal is quite simple - maximize intra
cluster similarity, while maximizing inter cluster differences. In other
words, find tight clusters that are far apart from each other.

In many respects it seems like H2 might be a very good candidate for use
as a criterion function in general. Since it relies on centroid
computations only and does not do exhaustive pairwise comparisons it is a
bit more efficient than H1, and in principle it seems to make some sense.
So perhaps in addition to I2 it would make sense to try H2 from time to
time in experiments. H1 and I1 are also interesting, although I think both
have a bias towards finding clusters of the same size, which might not be
exactly what we want.

Ted

--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to