andrew mcsweeny wrote:
>Hi:
>
> I'm clustering a microarray dataset with a large # of samples. I would
> like your opinion on the best way to automatically determine the optimal # of
> clusters. Currently I am using the "cluster" package, clustering with
> "clara", examining the average silhouette width at various numbers of
> clusters. I'd like opinions on whether any newer packages offer better
> determination of optimal # of clusters, considering the algorithms in
> "cluster" were developed decades ago. By the way, I have alot of missing
> values in my dataset, coded as "NA", so some software packages don't work.
>
> Here is the code I've been using:
>
> library(cluster)
> avgsil <- c()
>
>for (k in kseq){
> clarares <- clara(data, k, rngR = TRUE)
> savg <- clarares$silinfo$avg.width
> print(c(k,savg))
> avgsil[k] <- savg
>}
> k<-kseq
>plot(k,avgsil[k])
>lines(k,avgsil[k])
>
> Sincerely,
>
> Andrew McSweeny
> grad student
> Medical University of Ohio
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[email protected] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
Following Fraley et al. I suggest to use the Bayesian inference
function (BIC). You can find it in mclust package.
HTH, Andrej
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html