Hello everyone! My data is composed of 277 individuals measured on 8 binary variables (1=yes, 2=no).
I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The objective is to have the means for each variable per retained cluster. 1) the R analysis ran as followed: > call data > dist=dist(data,method="euclidean") > cluster=hclust(dist,method="ward") > cluster Call: hclust(d = dist, method = "ward") Cluster method : ward Distance : euclidean Number of objects: 277 > plot(cluster) > rect.hclust(cluster, k=4, border="red") > x=rect.hclust(cluster, k=4, border="red") > sapply(x, function(i) colMeans(data[i,])) > round(sapply(x, function(i) colMeans(data[i,])),2) 2) The SPSS analysis ran as follows: Analysis --> Classify --> Hierarchical cluster analysis --> Cluster method= Ward's method and Distance measure= Interval: Squared Euclidean distance. After that, I computed the means of each variable for each cluster. The problem is I have different results between the two analyses (different clusters and means). However, when I use the "Euclidean distance" (unsquared) in SPSS, I have the same results! I thought the R "euclidean" command meant the "usual square distance between the two vectors (2 norm)" as specified in the documentation, no the unsquared distance. Did it not? Thanks for the comment! Jeffrey [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.