Thank you very much Gavin, The set.seed is the correct function i need.
Now the kmeans is permanent and doesn't change results every time i run. 2007/3/30, Gavin Simpson <[EMAIL PROTECTED]>: > > On Fri, 2007-03-30 at 09:07 +0200, Sergio Della Franca wrote: > > My simple problem is that when i run kmeans this give me different > > results because if centers is a number, a random set of (distinct) > > rows in x is chosen as the initial centres. > > You can stop this and make it reproducible by setting the seed for the > random number generator before doing kmeans - this way the same > (pseudo)random set of rows get selected each time: > > dat <- data.frame(a = rnorm(100), b = rnorm(100), c = rnorm(100)) > set.seed(1234) > km <- kmeans(dat, 2) > set.seed(1234) > km2 <- kmeans(dat, 2) > all.equal(km, km2) ## TRUE > > But ask yourself is this is helpful? Are the solutions similar each time > you run the function (without setting the seed) and get different > results? If the runs give very different results then it is likely that > you are finding local minima not an optimal solution - a common problem > with iterative algorithms using random starts. > > One solution to this /is/ to use several random starts and see if you > get similar results. Some samples may switch clusters, but if the bulk > of samples assigned to same cluster (i.e. together, not in cluster "1" > as the cluster number is random) then you can be happy with the result. > That some samples switch clusters may just indicate that there isn't a > clearly defined clustering of all your samples - some are intermediate > between clusters. > > Another is to use a hierarchical cluster analysis (via hclust()). Cut it > at the number of clusters you want and use the centers (sic) of those > clusters as the starting points for kmeans. This way the hclust() > results get you close to a good solution, which kmeans then updates as > it is not constrained by having a hierarchical structure. > > There is an example of this in Modern Applied Statistics with S (2002 - > Venables and Ripley, Springer), but if you don't have this book, you can > see the MASS scripts for Chapter 11 of the book. The MASS scripts should > have been provided with your copy of R, in > RINSTALL/library/MASS/scripts/ where RINSTALL is the where your version > of R is installed. Then you want ch11.R in that directory. Look at > section 11.2 Cluster Analysis in that file > > > > > About me the problem is simple. > > > > The question i ask you is if it possible that centers could be > > different from number. > > i.e. instead of indicate a number of center, could be possible > > indicate different character lable to identify the cluster i want to > > obtain? > > No. And this is why, despite how clear and simple the problem is to you, > you need to show us an example of your data! Surly, if you have > information that exactly identifies the clusters you want to find, why > do you need a clustering algorithm to find them for you? > > G > > > > > thk you > > > > > > > > 2007/3/29, Gavin Simpson <[EMAIL PROTECTED]>: > > On Thu, 2007-03-29 at 15:02 +0200, Sergio Della Franca wrote: > > > Dear R-Helpers, > > > > > > I read in the R documentation, about kmeans: > > > > > > centers > > > > > > Either the number of clusters or a set of initial (distinct) > > cluster > > > centres. *If a number*, a random set of (distinct) rows in x > > is chosen as > > > the initial centres. > > > My question is: could it be possible that the centers are > > character and not > > > number? > > > > I think you misunderstand - centers is the number of clusters > > you want > > to partition your data into. How else would you specify the > > number of > > clusters other than by a number? So no, it has to be a numeric > > number. > > > > The alternative use of centers is to provide known starting > > points for > > the algorithm, such as from the results of a hierarchical > > cluster > > analysis, that are the locations of the cluster centroids, for > > each > > cluster, on each of the feature variables. > > > > Also, argument x to kmeans() is specific about requiring a > > numeric > > matrix (or something coercible to one), so characters here are > > not > > allowed either. > > > > But then again, I may not have understood what it is that you > > are > > asking, but that is not surprising given that you have not > > provided an > > example of what you are trying to do, and how you tried to do > > it but > > failed. > > > > > and provide commented, minimal, self-contained, reproducible > > code. > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > G > > -- > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~ > > %~%~%~% > > Gavin Simpson [t] +44 (0)20 7679 0522 > > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > > Pearson Building, [e] > > gavin.simpsonATNOSPAMucl.ac.uk > > Gower Street, London [w] > > http://www.ucl.ac.uk/~ucfagls/ > > UK. WC1E 6BT. [w] > > http://www.freshwaters.org.uk > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~ > > %~%~%~% > > > > > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC [f] +44 (0)20 7679 0565 > UCL Department of Geography > Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street > London, UK [w] http://www.ucl.ac.uk/~ucfagls/ > WC1E 6BT [w] http://www.freshwaters.org.uk/ > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.