Re: [Rd] hclust() and agnes() method="average" divergence (PR#3648)

maechler Thu, 14 Aug 2003 15:24:33 -0700

>>>>> "MikG" == m grum <[EMAIL PROTECTED]>
>>>>>     on Mon, 4 Aug 2003 08:51:30 +0200 (MET DST) writes:



    MikG> Anyone have a clue why hclust() and agnes() produce
    MikG> different results in the example below when both use
    MikG> method="average"??  I'm not able to reproduce the
    MikG> problem with other datasets.

    MikG> ereck <- read.table("Ereck.txt",header=TRUE,sep="\t")
    MikG> emol <- subset(ereck,select=c(11:18,20:32))
    MikG> library(cluster)
    MikG> library(mva)
    MikG> daisemol <- daisy(emol,type=list(asymm=c(1:21)))

The reason is that most of the distances/dissimilarities are the
same: there are only 20 different values in the 1326 distances.

> sort(table(daisemol), decreasing=TRUE)

starts as
>> 0.666666666666667               0.5               0.8 0.285714285714286 
>>               387               284               251                94 

i.e. the distance 2/3 appears 387 times,  1/2 does 284 times, etc.
With so many ties in the distances, choosing the next
observation / cluster for "merging" is often chosing among many
possibilities and hence the arbitrariness and the difference
between too algorithms.

For your situation, you might be able to use some continuous
variable along with the factors and the many binary ones such
that the distances won't have ties.

NO bug! {i.e. you should have posted to R-help (you did have a
good question!)} not R-bugs.

Regards,
Martin Maechler <[EMAIL PROTECTED]>     http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16    Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich     SWITZERLAND
phone: x-41-1-632-3408          fax: ...-1228                   <><

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] hclust() and agnes() method="average" divergence (PR#3648)

Reply via email to