Hi:
I'm clustering a microarray dataset with a large # of samples. I would
like your opinion on the best way to automatically determine the optimal # of
clusters. Currently I am using the "cluster" package, clustering with "clara",
examining the average silhouette width at various numbers of clusters. I'd
like opinions on whether any newer packages offer better determination of
optimal # of clusters, considering the algorithms in "cluster" were developed
decades ago. By the way, I have alot of missing values in my dataset, coded as
"NA", so some software packages don't work.
Here is the code I've been using:
library(cluster)
avgsil <- c()
for (k in kseq){
clarares <- clara(data, k, rngR = TRUE)
savg <- clarares$silinfo$avg.width
print(c(k,savg))
avgsil[k] <- savg
}
k<-kseq
plot(k,avgsil[k])
lines(k,avgsil[k])
Sincerely,
Andrew McSweeny
grad student
Medical University of Ohio
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html