>>>>> "MikG" == m grum <[EMAIL PROTECTED]>
>>>>> on Mon, 4 Aug 2003 08:51:30 +0200 (MET DST) writes:
MikG> Anyone have a clue why hclust() and agnes() produce
MikG> different results in the example below when both use
MikG> method="average"?? I'm not able to reproduce the
MikG> problem with other datasets.
MikG> ereck <- read.table("Ereck.txt",header=TRUE,sep="\t")
MikG> emol <- subset(ereck,select=c(11:18,20:32))
MikG> library(cluster)
MikG> library(mva)
MikG> daisemol <- daisy(emol,type=list(asymm=c(1:21)))
The reason is that most of the distances/dissimilarities are the
same: there are only 20 different values in the 1326 distances.
> sort(table(daisemol), decreasing=TRUE)
starts as
>> 0.666666666666667 0.5 0.8 0.285714285714286
>> 387 284 251 94
i.e. the distance 2/3 appears 387 times, 1/2 does 284 times, etc.
With so many ties in the distances, choosing the next
observation / cluster for "merging" is often chosing among many
possibilities and hence the arbitrariness and the difference
between too algorithms.
For your situation, you might be able to use some continuous
variable along with the factors and the many binary ones such
that the distances won't have ties.
NO bug! {i.e. you should have posted to R-help (you did have a
good question!)} not R-bugs.
Regards,
Martin Maechler <[EMAIL PROTECTED]> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel