On Sun, Nov 25, 2012 at 6:32 PM, Charles Jones cnjo...@vt.edu wrote:
Professor Oksanen:
Thanks for the reply and sorry for the confusion! (I'm still trying to wrap
my head around the multivariate lingo.)
1.) The scores from my NMDS analysis are non-normal (tested using the
multivariate Shapiro-Wilks test.)
2 and 3) I am using these scores as input for the Cluster Analysis (Ward's
Method) to define several different groups. One of the underlying
assumptions associated with Wards algorithm is that the input is normal.
So, the question is, is it okay to ignore that assumption (normality of
input data) to define the groups? Since I used the MMRP test
(nonparametric) to show there is a difference between the groups, it seems
like this is reasonable. However, I wasn't quite sure.
Thanks Again!
Nate Jones
Nate,
IIRC Ward's method assumes multivariate normality of each cluster
formed by multivariate observations (as it treats cluster analysis as
an ANOVA problem and so it is sensitive to outliers), but as this
method tends to create rather small clusters I would suggest you to
try some of the other algorithms available in R to test the stability
of the clustering obtained by Ward's linkage.
However, I wonder why do you need to get trough the NMDS step -- can't
you directly cluster the cases in your dataset? Something like:
distances - dist(USArrests, method = euclidean) # we need Euclidean
distances here
fit - hclust(distances, method=ward) # Ward's method
plot(fit) # plot dendrogram
grps - cutree(fit, k=5) # suppose you can interpret 5 clusters
rect.hclust(fit, k=5, border=red) # add red boxes around the 5
clusters in dendogram
Then, you could experiment with mmrp() and different grouping indexes
(obtained from different cuts trough the dendrogram), but I am not
sure if this wouldn't be regarded as circular reasoning.
Cheers,
Ivailo
--
UBUNTU: a person is a person through other persons.
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology