Re: [R] Colouring hclust() trees

2004-05-11 Thread Martin Keller-Ressel
How about this:

hc is your hclust object,colv the color vector ordered like the 
original data
and height the height of the color bar as fraction of dendrogram height.

colorplot.hclust - function(hc,colv,height=.05) {
plot(hc,lab=FALSE,hang=0)
stopifnot(length(hc$order) == length(colv))
xy.mat - list(x=1:length(colv),y=c(-max(hc$height)*height,0))
image(xy.mat,z=matrix(colv[hc$order],ncol=1),add=TRUE)
}
## Example:

data(iris)
hc1 - hclust(dist(scale(iris[,1:4])),method=ward)
colorplot.hclust(hc1,as.numeric(iris[,5]))
hth,

Martin Keller-Ressel



--

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Colouring hclust() trees

2004-05-10 Thread Thomas Petzoldt
Richard A. O'Keefe wrote:
I have a data set with  6 variables and 251 cases.
The people who supplied me with this data set believe that it falls
naturally into three groups, and have given me a rule for determining
group number from these 6 variables.
One possibility is to extract the coordinates used by the dendrogram 
using par(usr) and then to do annotations using ?text, but as a global 
alternative in cases like this (many cases and known number of classes), 
I would suggest a different cluster alorithm, e.g. ?kmeans. If you want 
to get a visual idea you may try to apply an ordination method (e.g. 
princomp or isoMDS the latter from package MASS) and color the objects 
according to their class found by kmeans.

Hope it helps

Thomas P.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Colouring hclust() trees

2004-05-10 Thread Richard A. O'Keefe
I asked about putting some kind of coloured rug under a dendrogram.

Thomas Petzoldt [EMAIL PROTECTED] replied:
One possibility is to extract the coordinates used by the dendrogram 
using par(usr) ...

Er, the documentation for par(usr) says
'usr' A vector of the form 'c(x1, x2, y1, y2)' giving the extremes
  of the user coordinates of the plotting region.  When a
  logarithmic scale is in use (i.e., 'par(xlog)' is true, see
  below), then the x-limits will be '10 ^ par(usr)[1:2]'. 
  Similarly for the y-axis.
But I _know_ the (logical) coordinates of the plotting region; what I need
is the coordinates of the leaves of the dendrogram.

but as a global alternative in cases like this (many cases and
known number of classes), I would suggest a different cluster
alorithm, e.g. ?kmeans.

That doesn't really help, amongst other things because kmeans is not
a hierarchical algorithm.  I *DON'T* know the true number of classes.
I know how many classes the person who collected the data thinks there
are, and I don't need to do any clustering to find them, he gave me a
simple rule.  What I want to know is how many clusters there OUGHT to be
and how similar these clusters are to the ones he thought there were.
From poking around, the right number of clusters is somewhere between
2 and 6.  (For the record, I _have_ tried kmeans and I've tabulated the
kmeans groups against the prespecified groups.)

If you want to get a visual idea you may try to apply an
ordination method (e.g. princomp or isoMDS the latter from
package MASS) and color the objects according to their class
found by kmeans.

I had already done that (using the prespecified classes, not classes found
by kmeans).  But it didn't solve my present problem, which was overlaying
the *prespecified* classes onto a dendrogram.

Two other people gave me answers that are spot on.
Unfortunately, I've now lost their messages, so I can't name them.

Suggestion 1:  use the RowSideColors (or ColSideColors) argument of heatmap().
This gives me two dendrograms (and I can suppress one if I want) and a heat
image of the data, and all things considered, it's *better* than what I wanted.
(I was aware of heatmap, but I'd failed to notice the relevance, or even the
existence, of the ???SideColors arguments.)  In this particular case, the
graph _beautifully_ displays what I want it to display.

Suggestion 2:  use the draw.clust function from the maptree packages.
I have now installed this package (which R makes *so* easy) and it does
exactly what I asked for.

Both of these approaches work with any dendrogram.

I'm beginning to suspect that if something isn't already available in R,
I'll never be able to imagine a need for it.  But then I'm a bear of
very little brain...

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Colouring hclust() trees

2004-05-09 Thread Richard A. O'Keefe
I have a data set with  6 variables and 251 cases.
The people who supplied me with this data set believe that it falls
naturally into three groups, and have given me a rule for determining
group number from these 6 variables.

If I do
scaled.stuff - scale(stuff, TRUE, c(...the design ranges...))
stuff.dist - dist(scaled.stuff)
stuff.hc - hclust(stuff.dist)
plot(stuff.hc)
I get a dendrogram which looks sort of plausible, but

(a) with this many leaves, the leaf labels really aren't legible at any
plausible scaling, and would be best omitted.  I could figure out
which point was which if there were some way to use identify(), but
I'm justnot seeing it.

(b) what I'd really like to do is to colour the leaves according to the
predicted group, or some other variable.  The obvious thing to try is
plot(stuff.hc, col=c(red,green,blue)[stuff.predicted.group])
but that doesn't work.  I read everything that seemed plausible, and
came across nodePar, but

col - c(red,green,blue)[stuff.predicted.group]
plot(stuff.hc, nodePar=list(col=list(black,col)))

tells me repeatedly that

parameter nodePar couldn't be set in high-level plot() function 

while 

plot(as.dendrogram(hc), nodePar=list(col=list(black,col)))

draws the dendrogram (_much_ slower than plot() does) and still gives
me no colouring at all.  Clearly I have misunderstood how to use
nodePar.

(c) The obvious fall-back is to use points() to draw the nodes again in
the colours I want, but if I could do that, I could use identify().

The frustrating thing is that when I do

d - dim(stuff))[1]
plot(1:d, 1:d, col=col[stuff.hc$order])

shows me that there _is_ a strong connection between the groups found by
hclust() and the predicted groups, albeit not a simple one.

I have looked at plot.dendrogram() and plotNode() -- using getAnywhere() --
and it looks to me as though what I want *should* be doable, but I've
clearly misunderstood the details of how to do it.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html