[R] determining optimal # of clusters for a given dataset (e.g. between 2 and K)

andrew mcsweeny Wed, 19 Apr 2006 15:39:37 -0700

Hi:
   
     I'm clustering a microarray dataset with a large # of samples.  I would 
like your opinion on the best way to automatically determine the optimal # of 
clusters.  Currently I am using the "cluster" package, clustering with "clara", 
examining the average silhouette width at various numbers of clusters.  I'd 
like opinions on whether any newer packages offer better determination of 
optimal # of clusters, considering the algorithms in "cluster" were developed 
decades ago.  By the way, I have alot of missing values in my dataset, coded as 
"NA", so some software packages don't work.
   
     Here is the code I've been using:
   
  library(cluster)
  avgsil <- c()
  
for (k in  kseq){
  clarares <- clara(data, k, rngR = TRUE)
  savg <- clarares$silinfo$avg.width
  print(c(k,savg))
  avgsil[k] <- savg
}
  k<-kseq
plot(k,avgsil[k])
lines(k,avgsil[k])
   
  Sincerely,
   
  Andrew McSweeny
  grad student
  Medical University of Ohio


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] determining optimal # of clusters for a given dataset (e.g. between 2 and K)

Reply via email to