Re: [R] cluster size

Christian Hennig Fri, 11 Dec 2009 07:57:04 -0800

Dear Ms Karunambigai,

the kmeans algorithm depends on random initialisation.

There are two basic strategies that can be applied in order to make yourresults reproducible:1) Fix the random number generator by means of set.seed (see ?set.seed)before you run kmeans. The problem with this is that your solution canonly be reproduced using the same random seed; it technically still israndom.

2) Specify fixed initial centers, using the centers argument in kmeans.

(Sensible initial centers may be obtained by running hclust using Ward'smethod, obtain the desired number of clusters using cutree and compute thecenters of the resulting clusters; sorry that Idon't have the time right now to explain how to do that precisely; thehelp pages and hopefully some understanding of what is going on may helpyou further.)

An alternative strategy that will not absolutely guarantee reproducibilitybut make your results more stable is to use kmeansruns in library fpc, whichis a wrapper that runs kmeans several times and gives you the optimalsolution. That should reproduce its outcome with higher probability(though not precisely 1).I don't know whether the default value runs=100 is sufficient to give astable solution for your data, but increasing the runs parameter may help.


Cheers,
Christian

On Fri, 11 Dec 2009, karuna m wrote:

hi r-help,
i am doing kmeans clustering in stats. i tried for five clusters clustering 
using:
kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife",
                        "rellife","hordlife","doutlife","symtlife","washlife",
                       "chcklife","rptlife","countlife","coltlife","ordlife")], 
5, iter.max = 10, nstart = 1,
         algorithm = "Hartigan-Wong")
      table(kcl1$cluster)
every time i am getting five clusters of different sizes like first time with 
cluster sizes
table(kcl1$cluster)
  1   2   3   4   5
140  72 105  98 112
second time with cluster sizes
table(kcl1$cluster)
  1   2   3   4   5
 91 149 106  76 105 and so on.
I wish to know that whether there is any function to get same sizes of clusters 
everytime when we do kmeans clustering.
Thanks in advance.
regards,
Ms.Karunambigai M
PhD Scholar
Dept. of Biostatistics
NIMHANS
Bangalore
India


     The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
        [[alternative HTML version deleted]]


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cluster size

Reply via email to