Lucke, Joseph F wrote: > Most standard tests, such as t-tests and ANOVA, are fairly resistant to > non-normalilty for significance testing. It's the sample means that have > to be normal, not the data. The CLT kicks in fairly quickly. Testing > for normality prior to choosing a test statistic is generally not a good > idea.
I beg to differ Joseph. I have had many datasets in which the CLT was of no use whatsoever, i.e., where bootstrap confidence limits were asymmetric because the data were so skewed, and where symmetric normality-based confidence intervals had bad coverage in both tails (though correct on the average). I see this the opposite way: nonparametric tests works fine if normality holds. Note that the CLT helps with type I error but not so much with type II error. Frank > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy > Sent: Friday, May 25, 2007 12:04 PM > To: [EMAIL PROTECTED]; Frank E Harrell Jr > Cc: r-help > Subject: Re: [R] normality tests [Broadcast] > > From: [EMAIL PROTECTED] >> On 25/05/07, Frank E Harrell Jr <[EMAIL PROTECTED]> wrote: >>> [EMAIL PROTECTED] wrote: >>>> Hi all, >>>> >>>> apologies for seeking advice on a general stats question. I ve run > >>>> normality tests using 8 different methods: >>>> - Lilliefors >>>> - Shapiro-Wilk >>>> - Robust Jarque Bera >>>> - Jarque Bera >>>> - Anderson-Darling >>>> - Pearson chi-square >>>> - Cramer-von Mises >>>> - Shapiro-Francia >>>> >>>> All show that the null hypothesis that the data come from a normal > >>>> distro cannot be rejected. Great. However, I don't think >> it looks nice >>>> to report the values of 8 different tests on a report. One note is > >>>> that my sample size is really tiny (less than 20 >> independent cases). >>>> Without wanting to start a flame war, are there any >> advices of which >>>> one/ones would be more appropriate and should be reported >> (along with >>>> a Q-Q plot). Thank you. >>>> >>>> Regards, >>>> >>> Wow - I have so many concerns with that approach that it's >> hard to know >>> where to begin. But first of all, why care about >> normality? Why not >>> use distribution-free methods? >>> >>> You should examine the power of the tests for n=20. You'll probably > >>> find it's not good enough to reach a reliable conclusion. >> And wouldn't it be even worse if I used non-parametric tests? > > I believe what Frank meant was that it's probably better to use a > distribution-free procedure to do the real test of interest (if there is > one) instead of testing for normality, and then use a test that assumes > normality. > > I guess the question is, what exactly do you want to do with the outcome > of the normality tests? If those are going to be used as basis for > deciding which test(s) to do next, then I concur with Frank's > reservation. > > Generally speaking, I do not find goodness-of-fit for distributions very > useful, mostly for the reason that failure to reject the null is no > evidence in favor of the null. It's difficult for me to imagine why > "there's insufficient evidence to show that the data did not come from a > normal distribution" would be interesting. > > Andy > > >>> Frank >>> >>> >>> -- >>> Frank E Harrell Jr Professor and Chair School >> of Medicine >>> Department of Biostatistics >> Vanderbilt University >> >> -- >> yianni >> >> ______________________________________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > > ------------------------------------------------------------------------ > ------ > Notice: This e-mail message, together with any > attachments,...{{dropped}} > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
