[R] Cluster analysis: dissimilar results between R and SPSS

Jeoffrey Gaspard Mon, 26 Apr 2010 05:37:32 -0700

Hello everyone!

My data is composed of 277 individuals measured on 8 binary variables
(1=yes, 2=no).


I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The
objective is to have the means for each variable per retained cluster.

1) the R analysis ran as followed:

> call data
> dist=dist(data,method="euclidean")
> cluster=hclust(dist,method="ward")
> cluster

Call:
hclust(d = dist, method = "ward")

Cluster method   : ward
Distance         : euclidean
Number of objects: 277

> plot(cluster)
> rect.hclust(cluster, k=4, border="red")
> x=rect.hclust(cluster, k=4, border="red")
> sapply(x, function(i) colMeans(data[i,]))
> round(sapply(x, function(i) colMeans(data[i,])),2)

2) The SPSS analysis ran as follows:

Analysis --> Classify --> Hierarchical cluster analysis --> Cluster method=
Ward's method and Distance measure= Interval:  Squared Euclidean distance.
After that, I computed the means of each variable for each cluster.

The problem is I have different results between the two analyses (different
clusters and means).

However, when I use the "Euclidean distance" (unsquared) in SPSS, I have the
same results! 

I thought the R "euclidean" command meant the "usual square distance between
the two vectors (2 norm)" as specified in the documentation, no the
unsquared distance. Did it not?

Thanks for the comment!

Jeffrey



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cluster analysis: dissimilar results between R and SPSS

Reply via email to