Dear Ms Karunambigai,

the kmeans algorithm depends on random initialisation.
There are two basic strategies that can be applied in order to make your results reproducible: 1) Fix the random number generator by means of set.seed (see ?set.seed) before you run kmeans. The problem with this is that your solution can only be reproduced using the same random seed; it technically still is random.
2) Specify fixed initial centers, using the centers argument in kmeans.
(Sensible initial centers may be obtained by running hclust using Ward's method, obtain the desired number of clusters using cutree and compute the centers of the resulting clusters; sorry that I don't have the time right now to explain how to do that precisely; the help pages and hopefully some understanding of what is going on may help you further.)

An alternative strategy that will not absolutely guarantee reproducibility but make your results more stable is to use kmeansruns in library fpc, which is a wrapper that runs kmeans several times and gives you the optimal solution. That should reproduce its outcome with higher probability (though not precisely 1). I don't know whether the default value runs=100 is sufficient to give a stable solution for your data, but increasing the runs parameter may help.

Cheers,
Christian

On Fri, 11 Dec 2009, karuna m wrote:

hi r-help,
i am doing kmeans clustering in stats. i tried for five clusters clustering 
using:
kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife",
                        "rellife","hordlife","doutlife","symtlife","washlife",
                       "chcklife","rptlife","countlife","coltlife","ordlife")], 
5, iter.max = 10, nstart = 1,
         algorithm = "Hartigan-Wong")
      table(kcl1$cluster)
every time i am getting five clusters of different sizes like first time with 
cluster sizes
table(kcl1$cluster)
  1   2   3   4   5
140  72 105  98 112
second time with cluster sizes
table(kcl1$cluster)
  1   2   3   4   5
 91 149 106  76 105 and so on.
I wish to know that whether there is any function to get same sizes of clusters 
everytime when we do kmeans clustering.
Thanks in advance.
regards,
Ms.Karunambigai M
PhD Scholar
Dept. of Biostatistics
NIMHANS
Bangalore
India


     The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
        [[alternative HTML version deleted]]



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to