On 8/16/09, Ted Harding <ted.hard...@manchester.ac.uk> wrote: > > Oh, I had a slightly different H0 in mind. In the given example, > > cor.test(..., met="kendall") would test "H0: x and y are independent", > > but cor.test(..., met="pearson") would test: "H0: x and y are not > > correlated (or `are linearly independent')" . > > > Ah, now you are playing with fire! What the Pearson, Kendall and > Spearman coefficients in cor.test measure is *association*. OK, if > the results clearly indicate association, then the variables are > not independent. But it is possible to have two variables x, y > which are definitely not independent (indeed one is a function of > the other) which yield zero association by any of these measures. > > Example: > x <- (-10:10) ; y <- x^2 - mean(x^2) > cor.test(x,y,method="pearson") > # Pearson's product-moment correlation > # t = 0, df = 19, p-value = 1 > # alternative hypothesis: true correlation is not equal to 0 > # sample estimates: cor 0 > cor.test(x,y,method="kendall") > > # Kendall's rank correlation tau > > # z = 0, p-value = 1 > # alternative hypothesis: true tau is not equal to 0 > # sample estimates: tau 0 > # cor.test(x,y,method="spearman") > # Spearman's rank correlation rho > # S = 1540, p-value = 1 > # alternative hypothesis: true rho is not equal to 0 > # sample estimates: rho 0 > > If you wanted, for instance, that the "method=kendall" should > announce that it is testing "H0: x and y are independent" then > it would seriously mislead the reader! > I did take the null statement from the description of Kendall::Kendall() ("Computes the Kendall rank correlation and its p-value on a two-sided test of H0: x and y are independent."). Here, perhaps "monotonically independent" (as opposed to "functionally independent") would have been more appropriate.
Still, this very example seems to support my original idea: users can easily get confused on what is the exact null of a test. Does it test for "association" or for "no association", for "normality" or for "lack of normality" . Printing a precise and appropriate statement of the null would prove helpful in interpreting the results, and in avoiding misinterpreting these. > > Here both "H0: x is normal" and "Ha: x is not normal" are missing. At > > least to beginners, these things are not always perfectly clear (even > > after reading the documentation), and when interpreting the results it > > can prove useful to have on-screen information about the null. > > This is possibly a more discussable point, in that even if you know > what the Shapiro-Wilk statistic is, it is not obvious what it is > sensitive to, and hence what it might be testing for. But I doubt > that someone would be led to try the Shapiro-Wilk test in the > first place unless they were aware that it was a test for normality, > and indeded this is announced in the first line of the response. > The alternative, therefore, is "non-normality". > To be particularly picky, as statistics is, this is not so obvious from the print-out. For the Shapiro-Wilk test one could indeed deduce that since it is a "test of normality", then the null tested is "H0: data is normal". This would not hold for, say, the Pearson correlation. In loose language, it would estimate and test for "correlation"; in more statistically appropriate language, it will test for "no correlation" (or for "no association"). It feels to me that without appropriate indicators, one can easily get playing with fire. > As to the contrast between absence of an "Ha" statement for the > Shapiro-Wilk, and its presence in cor,test(), this comes back to > the point I made earlier: cot.test() offers you three alternatives > to choose from: "two-sided" (default), "greater", "less". This > distinction can be important, and when cor.test() reports "Ha" it > tells you which one was used. > > On the other hand, as far as Shapiro-Wilk is concerned there is > no choice of alternatives (nor of anything else except the data x). > So there is nothing to tell you! And, further, departure from > normality has so many "dimensions" that alternatives like "two > sided", "greater" or "less" would make no sense. One can think of > tests targeted at specific kinds of alternative such as "Distribution > is excessively skew" or "distribution has excessive kurtosis" or > "distribution is bimodal" or "distribution is multimodal", and so on. > But any of these can be detected by Shapiro-Wilk, so it is not > targeted at any specific alternative. > Thank you for these explanations. Best Liviu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel