Hi all,
I am working on a cluster analysis project and I want to implement a
stopping rule for it. At the moment I want to implement the C/H stopping
rule.
Currently I am computing the WGSS like this (in Java):
public static double computeWGSS(DocumentGroup group) {
if (group.getDocuments().size() == 1) {
return 0.0;
}
double wgss = 0.0;
Document[] docs = group.getDocuments();
for (int i = 0; i < docs.length; i++) {
for (int j = i + 1; j < docs.length; j++) {
Document d1 = docs[i];
Document d2 = docs[j];
wgss += computeSumOfSquares(d1.getProfile().getVector(),
d2.getProfile().getVector());
}
}
return wgss / group.size();
}
This is implemented according to C/H's paper "A dendrite method for cluster
analysis".
However I have been unable to find the algorithm for computing BGSS. At the
moment I have implemented it like this:
public static double computeBGSS(List<DocumentGroup> groupList) {
if (groupList.size() == 1) {
return 0.0;
}
double bgss = 0.0;
for (int i = 0; i < groupList.size(); i++) {
for (int j = i + 1; j < groupList.size(); j++) {
DocumentGroup group1 = groupList.get(i);
DocumentGroup group2 = groupList.get(j);
bgss += computeBGSS(group1, group2);
}
}
return bgss;
}
public static double computeBGSS(DocumentGroup group1, DocumentGroup
group2) {
double bgss = 0.0;
for (Document d1 : group1.getDocuments()) {
for (Document d2 : group2.getDocuments()) {
bgss += computeSumOfSquares(d1.getProfile().getVector(),
d2.getProfile().getVector());
}
}
return bgss;
}
Is this implementation correct? When calculating WGSS, we divide the pooled
sum of squares by the number of documents in the cluster, do we have to
divide the pooled sum of squares in BGSS by something, like the number of
clusters?
Thanks in advance,
Behrang Saeedzadeh
-------------------------------
http://my.opera.com/behrangsa
http://twitter.com/behrangsa
http://www.linkedin.com/in/behrangsa
http://www.facebook.com/people/Behrang-Saeedzadeh/619892726
http://www.last.fm/user/behrangsa
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users