Hi

I'm trying to cluster a continuous dataset with a varying number of clusters 
and with a restriction that each cluster must have more than 'x' number of 
observations. 

I have tried the clara function, using silhouette to give me the neighbouring 
cluster mediod of each observation, then merging an observation from a cluster 
with less than 'x' obs. into its' neighbour, but this comes unstuck if their 
neighbours also have less than 'x' obs.

So I'm fiddling with dendrogram objects.  Is there any way of using the 
'members' attribute to cut a dendrogram to only include branches with more than 
'x' members?

An example output from clara with a data set of 1000 obs. and 82 clusters

> cl$clusinfo
      size   max_diss    av_diss isolation
 [1,]    1 0.00000000 0.00000000 0.0000000
 [2,]    3 1.19840221 0.40837142 5.0938561
 [3,]    4 0.16867940 0.07284916 0.5830662
 [4,]    2 0.13380551 0.06690276 0.5687456
 [5,]    3 0.21862177 0.13428115 1.0371933
 [6,]    5 0.10384573 0.05270335 0.5887887
 [7,]    2 0.08547020 0.04273510 0.4846024
 [8,]    4 0.18615254 0.09545067 0.7396865
 [9,]    7 0.15688781 0.08572887 0.6234016
.
.
.
[75,]   11 0.26963387 0.13985980 1.1447836
[76,]    6 0.21439705 0.11953365 0.5754212
[77,]    5 0.21131875 0.12920395 0.5567024
[78,]    3 0.17126227 0.09685930 0.7160261
[79,]    2 0.22622024 0.11311012 0.9457984
[80,]    2 0.10268536 0.05134268 0.5167766
[81,]    1 0.00000000 0.00000000 0.0000000
[82,]    2 0.10018837 0.05009419 0.2474480

Note that all observations from cluster 1 are not necessarily closest to 
cluster 2.

Cheers

Norm   

Norm Good
Statistician
CMIS/e-Health Research Centre
A joint venture between CSIRO and the Queensland Government
Lvl 20, 300 Adelaide Street BRISBANE QLD 4000
PO Box 10842 Adelaide Street BRISBANE QLD 4000
Ph: 07 3024 1640 Fx: 07 3024 1690 
Em: [EMAIL PROTECTED] Web: http://e-hrc.net/

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to