(Ted Harding) <[EMAIL PROTECTED]> writes: > For non-normal data, there's something of a question as to > what is meant (or, perhaps more accurately, what is intended > to be meant) by homogeneity of variance, as a test preliminary > to an analysis of variance.
Yes... If you use the test as a preliminary to an ANOVA, which largely depends on second order properties, I think it is reasonable to assume that you really mean to compare the variances. It's always been a mystery to me why SPSS prefers the Levene test, which tests whether the mean absolute deviation is identical, which is a pretty obviously not the same thing, unless you assume something like the distributions being scaled versions of eachother. The Tukey procedure that you outline below would seem to have something of the same issue: If two distributions have the same variance but different kurtosis, you'll get the heavy-tailed one occurring before the light-tailed one in that scheme, then a region where the light-tailed distribution dominates and finally a region where the heavy tailed distribution dominates again (think of a uniform distribution and a normal distribution with the same variance). It is hard to tell whether the M-W test is biased one way or the other, but it probably will not have a M-W distribution. Notice also, btw, that R has several dispersion tests in standard package "stats", including fligner.test() and ansari.test(). > It is possible to consider distribution-free approaches to > this mind of question. > > One of Tukey's sneakiest inventions was the application of > the Mann-Whitney test (usually seen as a test of identity > of distribution against location-shift types of alternative, > more accurately against alternatives like "P(X<u) > P(Y<u)") > to test similarity of dispersion. > > The trick: given X1 , ... , Xm and Y1 , ... , Yn, pool them > and sort the result as Z1 < Z2 < ... < ZN where N = m + n. > > Now take the Z's in the order > > Z[1] , Z[N] , Z[2] , Z[N-1] , Z[3] , Z[N-2] , .... > > i.e. work inwards from the ends, alternately from each end. > > Note, as you proceed, whether each Z is an X or a Y. > You thus get a sequence of Xs and Ys. Then sum the number > of pairs (X,Y) in this sequence where the X occurs earlier > than the Y. > > This sum, under the null hypothesis of identity of distribution, > has the Mann-Whitney distribution (just like its usual version), > and it is sensitive to differences of dispersion (e.g. if the > distribution of X is more dispersed than the distribution of Y, > then the Xs will be found earlier in the sequence since they > lie further out than the Ys and so will be counted in first > by the above method). > > No doubt, just as there are distribution-free extensions of > procedures like Mann-Whitney to several samples ("nonparametric > ANOVA"), so such a procedure could be applied to test equality > of "dispersions" for several samples, and no doubt it has been > done. > > However, I've not made use of such a procedure myself, so I > have to leave it to others to report details. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html