Re: [R] kmeans error (bug?)

2003-11-10 Thread Prof Brian Ripley
This is not a bug.  It just means that the algorithm sometimes finds an 
empty cluster, and as you asked for 34 clusters and it had 33 or less it 
stops.

What to do in this situation is currently under discussion, but the advice 
given is good: try another set of initial centres.

Please do read the description of a bug in the R FAQ, and do not misuse 
the term to mean `something I do not understand'.

On Mon, 10 Nov 2003, Murad Nayal wrote:

 I have been getting the following intermittent error from kmeans:
 
 str(cavint.p.r)
  num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ...
  - attr(*, dimnames)=List of 2
   ..$ : chr [1:1967] 6 49 87 102 ...
   ..$ : chr [1:13] HYD NEG POS OXY ...
  set.seed(34)
  kmeans(cavint.p.r,centers=34)
 Error: empty cluster: try a better set of initial centers
 
 the seed being equal to the number of centers in this case is just a
 coincidence. I've encountered the same error with or without setting the
 seed at different numbers of clusters.
 
 there is nothing particularly unusual about cavint.p.r (no NAs, NULLs),
 except maybe for the fact that the rows sum to 1.
 
  sum(is.na(cavint.p.r))
 [1] 0
  sum(is.nan(cavint.p.r))
 [1] 0
  
 
 I thought kmeans should select initial centers from the data if not
 given explicitly! any idea what might be going wrong?

And what makes you think it did not?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] kmeans error (bug?)

2003-11-10 Thread Murad Nayal



Prof Brian Ripley wrote:
 
 This is not a bug.  It just means that the algorithm sometimes finds an
 empty cluster, and as you asked for 34 clusters and it had 33 or less it
 stops.
 
 What to do in this situation is currently under discussion, but the advice
 given is good: try another set of initial centres.

I am running kmeans in a loop for a range of possible cluster numbers.
The error terminates the loop. is there a mechanism by which I can
'trap' the error so that I can rerun kmeans with another set of initial
centers and hence allow the loop to run to completion. something like
try {} catch() mechanism of C++ for example. A flag for kmeans that
would have it return say a NULL value rather than an error would also
help in this type of application.


In fact, I wonder if anyone can point me to research, or better still R
functions/package/recipe, that help in choosing the best number of
clusters for the data. What I have tried so far is to do a manova using
the clustering result from kmeans, plot the approximate F statistic
and/or the p-value and look for cluster numbers where a sharp increase
in F or -log(pvalue) occur. what I would like to do but don't know how
is to formally compare successive clustering models. I know you can
compare models using the R function anova. but anova does not seem to
work with mlm models?


 
 Please do read the description of a bug in the R FAQ, and do not misuse
 the term to mean `something I do not understand'.

This wasn't really a declaration that this behavior is a bug, rather it
was a question of whether it is (hence the question mark). I guess what
I found somewhat confusing is that if kmeans was selecting data points
at random as the initial cluster centers then, at least initially, non
of these clusters would start out empty. It wasn't immediately clear how
could further refinement result in clusters becoming empty.

thanks for the feedback


 
 On Mon, 10 Nov 2003, Murad Nayal wrote:
 
  I have been getting the following intermittent error from kmeans:
 
  str(cavint.p.r)
   num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ...
   - attr(*, dimnames)=List of 2
..$ : chr [1:1967] 6 49 87 102 ...
..$ : chr [1:13] HYD NEG POS OXY ...
   set.seed(34)
   kmeans(cavint.p.r,centers=34)
  Error: empty cluster: try a better set of initial centers
 
  the seed being equal to the number of centers in this case is just a
  coincidence. I've encountered the same error with or without setting the
  seed at different numbers of clusters.
 
  there is nothing particularly unusual about cavint.p.r (no NAs, NULLs),
  except maybe for the fact that the rows sum to 1.
 
   sum(is.na(cavint.p.r))
  [1] 0
   sum(is.nan(cavint.p.r))
  [1] 0
  
 
  I thought kmeans should select initial centers from the data if not
  given explicitly! any idea what might be going wrong?
 
 And what makes you think it did not?
 
 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

-- 
Murad Nayal M.D. Ph.D.
Department of Biochemistry and Molecular Biophysics
College of Physicians and Surgeons of Columbia University
630 West 168th Street. New York, NY 10032
Tel: 212-305-6884   Fax: 212-305-6926

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] kmeans error (bug?)

2003-11-10 Thread Jason Turner
Murad Nayal wrote:
I am running kmeans in a loop for a range of possible cluster numbers.
The error terminates the loop. is there a mechanism by which I can
'trap' the error so that I can rerun kmeans with another set of initial
centers and hence allow the loop to run to completion. something like
try {} catch() mechanism of C++ for example.
For R version  1.8.0, ?try
For R version = 1.8.0, see also ?tryCatch
Jason
--
Indigo Industrial Controls Ltd.
http://www.indigoindustrial.co.nz
64-21-343-545
[EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] kmeans error (bug?)

2003-11-09 Thread Murad Nayal

Hello,

I have been getting the following intermittent error from kmeans:

str(cavint.p.r)
 num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:1967] 6 49 87 102 ...
  ..$ : chr [1:13] HYD NEG POS OXY ...
 set.seed(34)
 kmeans(cavint.p.r,centers=34)
Error: empty cluster: try a better set of initial centers

the seed being equal to the number of centers in this case is just a
coincidence. I've encountered the same error with or without setting the
seed at different numbers of clusters.

there is nothing particularly unusual about cavint.p.r (no NAs, NULLs),
except maybe for the fact that the rows sum to 1.

 sum(is.na(cavint.p.r))
[1] 0
 sum(is.nan(cavint.p.r))
[1] 0
 

I thought kmeans should select initial centers from the data if not
given explicitly! any idea what might be going wrong?

I am running R 1.7.0

many thanks

Murad

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help