Hello all,
This might have been fixed in later versions (I am using R1.7.0), r-help archive contains messages reporting similar problems but no reports of codes fixes. I have encountered a couple of problems using the silhouette function. one occurs when the clustering contains clusters composed of 1 element (Martin Maechler posted code few months ago that fixes a similar problem that occurs when clusters have only 2 elements but not the case with 1 element). the other problem is due to silhouette's assumption that the clusters are numbered sequentially starting at 1. one of the clustering programs I use (snob) assigns more or less arbitrary integer ids to clusters starting from 3! (clusters 1 and 2 have special meaning in snob). the modified code fixing both problems is included below, changes are commented. best Murad silhouette.default <- function (x, dist, dmatrix, ...) { cll <- match.call() if (!is.null(cl <- x$clustering)) x <- cl n <- length(x) if (!all(x == round(x))) stop("`x' must only have integer codes") k <- length(clid <- sort(unique(x))) if (k <= 1 || k >= n) return(NA) if (missing(dist)) { if (missing(dmatrix)) stop("Need either a dissimilarity `dist' or diss.matrix `dmatrix'") if (is.null(dm <- dim(dmatrix)) || length(dm) != 2 || !all(n == dm)) stop("`dmatrix' is not a dissimilarity matrix compatible to `x'") } else { dist <- as.dist(dist) if (n != attr(dist, "Size")) stop("clustering `x' and dissimilarity `dist' are incompatible") dmatrix <- as.matrix(dist) } wds <- matrix(NA, n, 3, dimnames = list(names(x), c("cluster", "neighbor", "sil_width"))) for (j in 1:k) { Nj <- sum(iC <- x == clid[j]) # # the following line changed from wds[iC, "cluster"] <- j # wds[iC, "cluster"] <- clid[j] a.i <- if (Nj > 1) colSums(dmatrix[iC, iC])/(Nj - 1) else 0 # # the following line changed from # diC <- rbind(apply(dmatrix[!iC, iC], 2, function(r) tapply(r, # x[!iC], mean))) # diC <- rbind(apply(cbind(dmatrix[!iC, iC]), 2, function(r) tapply(r, x[!iC], mean))) minC <- max.col(-t(diC)) wds[iC, "neighbor"] <- clid[-j][minC] # # the following line changed from # b.i <- diC[cbind(minC, seq(minC))] # b.i <- diC[cbind(minC, seq(along=minC))] s.i <- (b.i - a.i)/pmax(b.i, a.i) wds[iC, "sil_width"] <- s.i } attr(wds, "Ordered") <- FALSE attr(wds, "call") <- cll class(wds) <- "silhouette" wds } -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html