Re: [R] Kmeans centers

Sergio Della Franca Fri, 30 Mar 2007 00:30:07 -0800

Thank you very much Gavin,

The set.seed is the correct function i need.


Now the kmeans is permanent and doesn't change results every time i run.





2007/3/30, Gavin Simpson <[EMAIL PROTECTED]>:
>
> On Fri, 2007-03-30 at 09:07 +0200, Sergio Della Franca wrote:
> > My simple problem is that when i run kmeans this give me different
> > results because if centers is a number, a random set of (distinct)
> > rows in x is chosen as the initial centres.
>
> You can stop this and make it reproducible by setting the seed for the
> random number generator before doing kmeans - this way the same
> (pseudo)random set of rows get selected each time:
>
> dat <- data.frame(a = rnorm(100), b = rnorm(100), c = rnorm(100))
> set.seed(1234)
> km <- kmeans(dat, 2)
> set.seed(1234)
> km2 <- kmeans(dat, 2)
> all.equal(km, km2) ## TRUE
>
> But ask yourself is this is helpful? Are the solutions similar each time
> you run the function (without setting the seed) and get different
> results? If the runs give very different results then it is likely that
> you are finding local minima not an optimal solution - a common problem
> with iterative algorithms using random starts.
>
> One solution to this /is/ to use several random starts and see if you
> get similar results. Some samples may switch clusters, but if the bulk
> of samples assigned to same cluster (i.e. together, not in cluster "1"
> as the cluster number is random) then you can be happy with the result.
> That some samples switch clusters may just indicate that there isn't a
> clearly defined clustering of all your samples - some are intermediate
> between clusters.
>
> Another is to use a hierarchical cluster analysis (via hclust()). Cut it
> at the number of clusters you want and use the centers (sic) of those
> clusters as the starting points for kmeans. This way the hclust()
> results get you close to a good solution, which kmeans then updates as
> it is not constrained by having a hierarchical structure.
>
> There is an example of this in Modern Applied Statistics with S (2002 -
> Venables and Ripley, Springer), but if you don't have this book, you can
> see the MASS scripts for Chapter 11 of the book. The MASS scripts should
> have been provided with your copy of R, in
> RINSTALL/library/MASS/scripts/ where RINSTALL is the where your version
> of R is installed. Then you want ch11.R in that directory. Look at
> section 11.2 Cluster Analysis in that file
>
> >
> > About me the problem is simple.
> >
> > The question i ask you is if it possible that centers could be
> > different from number.
> > i.e. instead of indicate a number of center, could be possible
> > indicate different character lable to identify the cluster i want to
> > obtain?
>
> No. And this is why, despite how clear and simple the problem is to you,
> you need to show us an example of your data! Surly, if you have
> information that exactly identifies the clusters you want to find, why
> do you need a clustering algorithm to find them for you?
>
> G
>
> >
> > thk you
> >
> >
> >
> > 2007/3/29, Gavin Simpson <[EMAIL PROTECTED]>:
> >         On Thu, 2007-03-29 at 15:02 +0200, Sergio Della Franca wrote:
> >         > Dear R-Helpers,
> >         >
> >         > I read in the R documentation, about kmeans:
> >         >
> >         >   centers
> >         >
> >         > Either the number of clusters or a set of initial (distinct)
> >         cluster
> >         > centres. *If a number*, a random set of (distinct) rows in x
> >         is chosen as
> >         > the initial centres.
> >         > My question is: could it be possible that the centers are
> >         character and not
> >         > number?
> >
> >         I think you misunderstand - centers is the number of clusters
> >         you want
> >         to partition your data into. How else would you specify the
> >         number of
> >         clusters other than by a number? So no, it has to be a numeric
> >         number.
> >
> >         The alternative use of centers is to provide known starting
> >         points for
> >         the algorithm, such as from the results of a hierarchical
> >         cluster
> >         analysis, that are the locations of the cluster centroids, for
> >         each
> >         cluster, on each of the feature variables.
> >
> >         Also, argument x to kmeans() is specific about requiring a
> >         numeric
> >         matrix (or something coercible to one), so characters here are
> >         not
> >         allowed either.
> >
> >         But then again, I may not have understood what it is that you
> >         are
> >         asking, but that is not surprising given that you have not
> >         provided an
> >         example of what you are trying to do, and how you tried to do
> >         it but
> >         failed.
> >
> >         > and provide commented, minimal, self-contained, reproducible
> >         code.
> >
> >         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >         G
> >         --
> >         %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
> >         %~%~%~%
> >         Gavin Simpson                 [t] +44 (0)20 7679 0522
> >         ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> >         Pearson Building,             [e]
> >         gavin.simpsonATNOSPAMucl.ac.uk
> >         Gower Street, London          [w]
> >         http://www.ucl.ac.uk/~ucfagls/
> >         UK. WC1E 6BT.                 [w]
> >         http://www.freshwaters.org.uk
> >         %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
> >         %~%~%~%
> >
> >
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Gavin Simpson                     [t] +44 (0)20 7679 0522
> ECRC                              [f] +44 (0)20 7679 0565
> UCL Department of Geography
> Pearson Building                  [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street
> London, UK                        [w] http://www.ucl.ac.uk/~ucfagls/
> WC1E 6BT                          [w] http://www.freshwaters.org.uk/
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kmeans centers

Reply via email to