On Mon, 16 Nov 2009, Jopi Harri wrote:

I am doing cluster analysis [hclust(Dist, method="average")] on
data that potentially contains redundant objects. As expected,
the inclusion of redundant objects affects the clustering result,
i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to
cluster differently from the same data without the redundancy,
i.e., a1, b, c, d, e1. This is apparent when the outcome is
visualized as a dendrogram.

Now, it seems that the clustering result for which the redundancy
has been eliminated is more robust for the present assignment
than that of the redundant data. Naturally, there is no problem
in the elimination: just exclude the redundant objects from Dist.

However, it would be very convenient to be able to include the
redundant objects in the *dendrogram* by attaching them as
0-level branches to the subtrees, i.e.:

1.0........-------........
0.5....___|__...._|_......
0.0.._|_..|..|..|.._|_....
....|.|.|.|..|..|.|...|...
...a1a2a3.b..c..d.e1.e2...

instead of

1.0........-------........
0.5....___|__...._|_......
0.0...|...|..|..|...|.....
......a1..b..c..d..e1.....

The question: Can this be accomplished in the *dendrogram plot*
by manipulating the resulting hclust data structure or by some
other means, and if yes, how?


Yes, you need to study

        ?hclust

particularly the part about 'Value' from which you will see what needs modification.


Here is a very simple example:

res <- hclust(dist(1-diag(3)*rnorm(3)))
plot(res)
res2 <- res
res2$merge <- rbind(-cbind(1:3,4:6), matrix(ifelse( res2$merge<0, -res2$merge, 
res2$merge+sum(res2$merge<0)),2))
res2$height <- c(rep(0,3), res2$height)
res2$order <- as.vector( rbind(res2$order,(4:6)[res2$order]) )
plot(res2)
str( res )
str( res2 )


Alternatively, you could use as.dendrogram( res ) as the point of departure and manipulate the value.

HTH,

Chuck




Jopi Harri

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to