Re: [R] normality tests [Broadcast]

2007-05-29 Thread Bert Gunter
False. Box proved ~ca 1952 that standard inferences in the linear regression
model are robust to nonnormality, at least for (nearly) balanced designs.
The **crucial** assumption is independence, which I suspect partially
motivated his time series work on arima modeling. More recently, work on
hierarchical models (e.g. repeated measures/mixed effect models) has also
dealt with lack of independence.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of wssecn
Sent: Friday, May 25, 2007 2:59 PM
To: r-help
Subject: Re: [R] normality tests [Broadcast]

 The normality of the residuals is important in the inference procedures for
the classical linear regression model, and normality is very important in
correlation analysis (second moment)...

Washington S. Silva

 Thank you all for your replies they have been more useful... well
 in my case I have chosen to do some parametric tests (more precisely
 correlation and linear regressions among some variables)... so it
 would be nice if I had an extra bit of support on my decisions... If I
 understood well from all your replies... I shouldn't pay s much
 attntion on the normality tests, so it wouldn't matter which one/ones
 I use to report... but rather focus on issues such as the power of the
 test...
 
 Thanks again.
 
 On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
   Most standard tests, such as t-tests and ANOVA, are fairly resistant to
  non-normalilty for significance testing. It's the sample means that have
  to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
  for normality prior to choosing a test statistic is generally not a good
  idea.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
  Sent: Friday, May 25, 2007 12:04 PM
  To: [EMAIL PROTECTED]; Frank E Harrell Jr
  Cc: r-help
  Subject: Re: [R] normality tests [Broadcast]
 
  From: [EMAIL PROTECTED]
  
   On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run
 
 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal
 
 distro cannot be rejected. Great. However, I don't think
   it looks nice
 to report the values of 8 different tests on a report. One note is
 
 that my sample size is really tiny (less than 20
   independent cases).
 Without wanting to start a flame war, are there any
   advices of which
 one/ones would be more appropriate and should be reported
   (along with
 a Q-Q plot). Thank you.

 Regards,

   
Wow - I have so many concerns with that approach that it's
   hard to know
where to begin.  But first of all, why care about
   normality?  Why not
use distribution-free methods?
   
You should examine the power of the tests for n=20.  You'll probably
 
find it's not good enough to reach a reliable conclusion.
  
   And wouldn't it be even worse if I used non-parametric tests?
 
  I believe what Frank meant was that it's probably better to use a
  distribution-free procedure to do the real test of interest (if there is
  one) instead of testing for normality, and then use a test that assumes
  normality.
 
  I guess the question is, what exactly do you want to do with the outcome
  of the normality tests?  If those are going to be used as basis for
  deciding which test(s) to do next, then I concur with Frank's
  reservation.
 
  Generally speaking, I do not find goodness-of-fit for distributions very
  useful, mostly for the reason that failure to reject the null is no
  evidence in favor of the null.  It's difficult for me to imagine why
  there's insufficient evidence to show that the data did not come from a
  normal distribution would be interesting.
 
  Andy
 
 
   
Frank
   
   
--
Frank E Harrell Jr   Professor and Chair   School
   of Medicine
  Department of Biostatistics
   Vanderbilt University
   
  
  
   --
   yianni
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
 
  
  --
  Notice:  This e-mail message, together with any
  attachments,...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read

Re: [R] normality tests [Broadcast]

2007-05-28 Thread Martin Maechler
 LuckeJF == Lucke, Joseph F [EMAIL PROTECTED]
 on Fri, 25 May 2007 12:29:49 -0500 writes:

LuckeJF  Most standard tests, such as t-tests and ANOVA,
LuckeJF are fairly resistant to non-normalilty for
LuckeJF significance testing. It's the sample means that
LuckeJF have to be normal, not the data.  The CLT kicks in
LuckeJF fairly quickly.

Even though such statements appear in too many (text)books,
that's just plain wrong practically:

Even though *level* of the t-test is resistant to non-normality, 
the power is not at all!!  And that makes the t-test NON-robust!
It's an easy exercise to see that  lim T-statistic --- 1  when
one observation goes to infinity, i.e., the t-test will never
reject when you have one extreme outlier; simple proof with R:

 t.test(11:20)

One Sample t-test

data:  c(11:20) 
t = 16.1892, df = 9, p-value = 5.805e-08
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
 13.33415 17.66585 
sample estimates:
mean of x 
 15.5 

##   --- unknown mean highly significantly different from 0
##   But

 t.test(c(11:20, 1000))

One Sample t-test

data:  c(11:20, 1000) 
t = 1.1731, df = 10, p-value = 0.2679
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval:
 -94.42776 304.42776 
sample estimates:
mean of x 
  105 




LuckeJF Testing for normality prior to choosing a test
LuckeJF statistic is generally not a good idea.

Definitely. Or even: It's a very bad idea ...

Martin Maechler, ETH Zurich


LuckeJF -Original Message- From:
LuckeJF [EMAIL PROTECTED]
LuckeJF [mailto:[EMAIL PROTECTED] On Behalf
LuckeJF Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM
LuckeJF To: [EMAIL PROTECTED]; Frank E Harrell Jr Cc:
LuckeJF r-help Subject: Re: [R] normality tests [Broadcast]

LuckeJF From: [EMAIL PROTECTED]
  On 25/05/07, Frank E Harrell Jr
 [EMAIL PROTECTED] wrote:  [EMAIL PROTECTED]
 wrote:   Hi all,
  
   apologies for seeking advice on a general stats
 question. I ve run

   normality tests using 8 different methods:   -
 Lilliefors   - Shapiro-Wilk   - Robust Jarque Bera 
  - Jarque Bera   - Anderson-Darling   - Pearson
 chi-square   - Cramer-von Mises   - Shapiro-Francia
  
   All show that the null hypothesis that the data come
 from a normal

   distro cannot be rejected. Great. However, I don't
 think it looks nice   to report the values of 8
 different tests on a report. One note is

   that my sample size is really tiny (less than 20
 independent cases).Without wanting to start a flame
 war, are there any advices of which   one/ones would be
 more appropriate and should be reported (along with   a
 Q-Q plot). Thank you.
  
   Regards,
  
 
  Wow - I have so many concerns with that approach that
 it's hard to know  where to begin.  But first of all,
 why care about normality?  Why not  use
 distribution-free methods?
 
  You should examine the power of the tests for n=20.
 You'll probably

  find it's not good enough to reach a reliable
 conclusion.
 
 And wouldn't it be even worse if I used non-parametric
 tests?

LuckeJF I believe what Frank meant was that it's probably
LuckeJF better to use a distribution-free procedure to do
LuckeJF the real test of interest (if there is one) instead
LuckeJF of testing for normality, and then use a test that
LuckeJF assumes normality.

LuckeJF I guess the question is, what exactly do you want
LuckeJF to do with the outcome of the normality tests?  If
LuckeJF those are going to be used as basis for deciding
LuckeJF which test(s) to do next, then I concur with
LuckeJF Frank's reservation.

LuckeJF Generally speaking, I do not find goodness-of-fit
LuckeJF for distributions very useful, mostly for the
LuckeJF reason that failure to reject the null is no
LuckeJF evidence in favor of the null.  It's difficult for
LuckeJF me to imagine why there's insufficient evidence to
LuckeJF show that the data did not come from a normal
LuckeJF distribution would be interesting.

LuckeJF Andy

 
   Frank
 
 
  --
  Frank E Harrell Jr Professor and Chair School of
 Medicine  Department of Biostatistics Vanderbilt
 University
 
 
 
 --
 yianni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide
 http://www.R-project.org/posting-guide.html and provide
 commented, minimal, self-contained, reproducible code.
 
 
 


LuckeJF 

LuckeJF -- Notice: This e-mail message, together with
LuckeJF any attachments,...{{dropped

Re: [R] normality tests [Broadcast]

2007-05-28 Thread Thomas Lumley
On Mon, 28 May 2007, Martin Maechler wrote:

 LuckeJF == Lucke, Joseph F [EMAIL PROTECTED]
 on Fri, 25 May 2007 12:29:49 -0500 writes:

LuckeJF  Most standard tests, such as t-tests and ANOVA,
LuckeJF are fairly resistant to non-normalilty for
LuckeJF significance testing. It's the sample means that
LuckeJF have to be normal, not the data.  The CLT kicks in
LuckeJF fairly quickly.

 Even though such statements appear in too many (text)books,
 that's just plain wrong practically:

 Even though *level* of the t-test is resistant to non-normality,
 the power is not at all!!  And that makes the t-test NON-robust!

While it is true that this makes the t-test non-robust, it doesn't mean 
that the statement is just plain wrong practically.

The issue really is more complicated than a lot of people claim (not you 
specifically, Martin, but upthread and previous threads).

Starting with the demonstrable mathematical facts:
  - lots of rank tests are robust in the sense of Huber
  - rank tests are optimal for specific location-shift testing problems.
  - lots of rank tests have excellent power for location shift alternatives
 over a wide range of underlying distributions.
  - rank tests fail to be transitive when stochastic ordering is not
assumed (they are not consistent with any ordering on all distributions)
  - rank tests do not lead to confidence intervals unless a location shift
or similar one-dimensional family is assumed
  - No rank test is uniformly more powerful than any parametric test or
vice versa (if we rule out pathological cases)
  - there is no rank test that is consistent precisely against a difference
in means
  - the t-test (and essentially all tests) can be made distribution-free in
large samples (for small values of 'large', usually)
  - being distribution-free does not guarantee robustness of power (for the
t-test or for any other test)


Now, if we assume stochastic ordering is the Wilcoxon rank-sum test more 
or less powerful than the t-test?  Everyone knows that this depends on the 
null hypothesis distribution.  Fewer people seem to know that it also 
depends on the alternative, especially in large samples.

Suppose the alternative of interest is not that the values are uniformly 
larger by 1 unit, but that 5% of them are about 20 units larger.  The 
Wilcoxon test -- precisely because it gives less weight to outliers -- 
will have lower power.  For example (ObR)

one.sim-function(n, pct, delta){
 x-rnorm(n)
 y-rnorm(n)+delta*rbinom(n,1,pct)
 list(x=x,y=y)
 }

mean(replicate(100, {d-one.sim(100,.05,20); t.test(d$x,d$y)$p.value})0.05)
mean(replicate(100, {d-one.sim(100,.05,20); 
wilcox.test(d$x,d$y)$p.value})0.05)

mean(replicate(100, {d-one.sim(100,.5,1); t.test(d$x,d$y)$p.value})0.05)
mean(replicate(100, {d-one.sim(100,.5,1); wilcox.test(d$x,d$y)$p.value})0.05)


Since both relatively uniform shifts and large shifts of small fractions 
are genuinely important alternatives in real problems it is true in 
practice as well as in theory that neither the Wilcoxon nor the t-test is 
uniformly superior.

This is without even considering violations of stochastic ordering -- 
which are not just esoteric pathologies, since it is quite plausible for a 
treatment to benefit some people and harm others. For example, I've seen 
one paper in which a Wilcoxon test on medical cost data was statistically 
significant in the *opposite direction* to the difference in means.

This has been a long rant, but I keep encountering statisticians who think 
anyone who ever recommends a t-test just needs to have the number 0.955 
quoted to them.

snip

LuckeJF Testing for normality prior to choosing a test
LuckeJF statistic is generally not a good idea.

 Definitely. Or even: It's a very bad idea ...


I think that's something we can all agree on.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread Liaw, Andy
From: [EMAIL PROTECTED]
 
 On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote:
   Hi all,
  
   apologies for seeking advice on a general stats question. I ve run
   normality tests using 8 different methods:
   - Lilliefors
   - Shapiro-Wilk
   - Robust Jarque Bera
   - Jarque Bera
   - Anderson-Darling
   - Pearson chi-square
   - Cramer-von Mises
   - Shapiro-Francia
  
   All show that the null hypothesis that the data come from a normal
   distro cannot be rejected. Great. However, I don't think 
 it looks nice
   to report the values of 8 different tests on a report. One note is
   that my sample size is really tiny (less than 20 
 independent cases).
   Without wanting to start a flame war, are there any 
 advices of which
   one/ones would be more appropriate and should be reported 
 (along with
   a Q-Q plot). Thank you.
  
   Regards,
  
 
  Wow - I have so many concerns with that approach that it's 
 hard to know
  where to begin.  But first of all, why care about 
 normality?  Why not
  use distribution-free methods?
 
  You should examine the power of the tests for n=20.  You'll probably
  find it's not good enough to reach a reliable conclusion.
 
 And wouldn't it be even worse if I used non-parametric tests?

I believe what Frank meant was that it's probably better to use a
distribution-free procedure to do the real test of interest (if there is
one) instead of testing for normality, and then use a test that assumes
normality.

I guess the question is, what exactly do you want to do with the outcome
of the normality tests?  If those are going to be used as basis for
deciding which test(s) to do next, then I concur with Frank's
reservation.

Generally speaking, I do not find goodness-of-fit for distributions very
useful, mostly for the reason that failure to reject the null is no
evidence in favor of the null.  It's difficult for me to imagine why
there's insufficient evidence to show that the data did not come from a
normal distribution would be interesting.

Andy

 
 
  Frank
 
 
  --
  Frank E Harrell Jr   Professor and Chair   School 
 of Medicine
Department of Biostatistics   
 Vanderbilt University
 
 
 
 -- 
 yianni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread Lucke, Joseph F
 Most standard tests, such as t-tests and ANOVA, are fairly resistant to
non-normalilty for significance testing. It's the sample means that have
to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
for normality prior to choosing a test statistic is generally not a good
idea. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
Sent: Friday, May 25, 2007 12:04 PM
To: [EMAIL PROTECTED]; Frank E Harrell Jr
Cc: r-help
Subject: Re: [R] normality tests [Broadcast]

From: [EMAIL PROTECTED]
 
 On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote:
   Hi all,
  
   apologies for seeking advice on a general stats question. I ve run

   normality tests using 8 different methods:
   - Lilliefors
   - Shapiro-Wilk
   - Robust Jarque Bera
   - Jarque Bera
   - Anderson-Darling
   - Pearson chi-square
   - Cramer-von Mises
   - Shapiro-Francia
  
   All show that the null hypothesis that the data come from a normal

   distro cannot be rejected. Great. However, I don't think
 it looks nice
   to report the values of 8 different tests on a report. One note is

   that my sample size is really tiny (less than 20
 independent cases).
   Without wanting to start a flame war, are there any
 advices of which
   one/ones would be more appropriate and should be reported
 (along with
   a Q-Q plot). Thank you.
  
   Regards,
  
 
  Wow - I have so many concerns with that approach that it's
 hard to know
  where to begin.  But first of all, why care about
 normality?  Why not
  use distribution-free methods?
 
  You should examine the power of the tests for n=20.  You'll probably

  find it's not good enough to reach a reliable conclusion.
 
 And wouldn't it be even worse if I used non-parametric tests?

I believe what Frank meant was that it's probably better to use a
distribution-free procedure to do the real test of interest (if there is
one) instead of testing for normality, and then use a test that assumes
normality.

I guess the question is, what exactly do you want to do with the outcome
of the normality tests?  If those are going to be used as basis for
deciding which test(s) to do next, then I concur with Frank's
reservation.

Generally speaking, I do not find goodness-of-fit for distributions very
useful, mostly for the reason that failure to reject the null is no
evidence in favor of the null.  It's difficult for me to imagine why
there's insufficient evidence to show that the data did not come from a
normal distribution would be interesting.

Andy

 
 
  Frank
 
 
  --
  Frank E Harrell Jr   Professor and Chair   School 
 of Medicine
Department of Biostatistics   
 Vanderbilt University
 
 
 
 --
 yianni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 



--
Notice:  This e-mail message, together with any
attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread gatemaze
Thank you all for your replies they have been more useful... well
in my case I have chosen to do some parametric tests (more precisely
correlation and linear regressions among some variables)... so it
would be nice if I had an extra bit of support on my decisions... If I
understood well from all your replies... I shouldn't pay s much
attntion on the normality tests, so it wouldn't matter which one/ones
I use to report... but rather focus on issues such as the power of the
test...

Thanks again.

On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]

 From: [EMAIL PROTECTED]
 
  On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
   [EMAIL PROTECTED] wrote:
Hi all,
   
apologies for seeking advice on a general stats question. I ve run

normality tests using 8 different methods:
- Lilliefors
- Shapiro-Wilk
- Robust Jarque Bera
- Jarque Bera
- Anderson-Darling
- Pearson chi-square
- Cramer-von Mises
- Shapiro-Francia
   
All show that the null hypothesis that the data come from a normal

distro cannot be rejected. Great. However, I don't think
  it looks nice
to report the values of 8 different tests on a report. One note is

that my sample size is really tiny (less than 20
  independent cases).
Without wanting to start a flame war, are there any
  advices of which
one/ones would be more appropriate and should be reported
  (along with
a Q-Q plot). Thank you.
   
Regards,
   
  
   Wow - I have so many concerns with that approach that it's
  hard to know
   where to begin.  But first of all, why care about
  normality?  Why not
   use distribution-free methods?
  
   You should examine the power of the tests for n=20.  You'll probably

   find it's not good enough to reach a reliable conclusion.
 
  And wouldn't it be even worse if I used non-parametric tests?

 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.

 I guess the question is, what exactly do you want to do with the outcome
 of the normality tests?  If those are going to be used as basis for
 deciding which test(s) to do next, then I concur with Frank's
 reservation.

 Generally speaking, I do not find goodness-of-fit for distributions very
 useful, mostly for the reason that failure to reject the null is no
 evidence in favor of the null.  It's difficult for me to imagine why
 there's insufficient evidence to show that the data did not come from a
 normal distribution would be interesting.

 Andy


  
   Frank
  
  
   --
   Frank E Harrell Jr   Professor and Chair   School
  of Medicine
 Department of Biostatistics
  Vanderbilt University
  
 
 
  --
  yianni
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 


 
 --
 Notice:  This e-mail message, together with any
 attachments,...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
yianni

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread Frank E Harrell Jr
Lucke, Joseph F wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea. 

I beg to differ Joseph.  I have had many datasets in which the CLT was 
of no use whatsoever, i.e., where bootstrap confidence limits were 
asymmetric because the data were so skewed, and where symmetric 
normality-based confidence intervals had bad coverage in both tails 
(though correct on the average).  I see this the opposite way: 
nonparametric tests works fine if normality holds.

Note that the CLT helps with type I error but not so much with type II 
error.

Frank

 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]
 
 From: [EMAIL PROTECTED]
 On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run
 
 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal
 
 distro cannot be rejected. Great. However, I don't think
 it looks nice
 to report the values of 8 different tests on a report. One note is
 
 that my sample size is really tiny (less than 20
 independent cases).
 Without wanting to start a flame war, are there any
 advices of which
 one/ones would be more appropriate and should be reported
 (along with
 a Q-Q plot). Thank you.

 Regards,

 Wow - I have so many concerns with that approach that it's
 hard to know
 where to begin.  But first of all, why care about
 normality?  Why not
 use distribution-free methods?

 You should examine the power of the tests for n=20.  You'll probably
 
 find it's not good enough to reach a reliable conclusion.
 And wouldn't it be even worse if I used non-parametric tests?
 
 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.
 
 I guess the question is, what exactly do you want to do with the outcome
 of the normality tests?  If those are going to be used as basis for
 deciding which test(s) to do next, then I concur with Frank's
 reservation.
 
 Generally speaking, I do not find goodness-of-fit for distributions very
 useful, mostly for the reason that failure to reject the null is no
 evidence in favor of the null.  It's difficult for me to imagine why
 there's insufficient evidence to show that the data did not come from a
 normal distribution would be interesting.
 
 Andy
 
  
 Frank


 --
 Frank E Harrell Jr   Professor and Chair   School 
 of Medicine
   Department of Biostatistics   
 Vanderbilt University

 --
 yianni

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 
 
 
 --
 Notice:  This e-mail message, together with any
 attachments,...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread Frank E Harrell Jr
[EMAIL PROTECTED] wrote:
 Thank you all for your replies they have been more useful... well
 in my case I have chosen to do some parametric tests (more precisely
 correlation and linear regressions among some variables)... so it
 would be nice if I had an extra bit of support on my decisions... If I
 understood well from all your replies... I shouldn't pay s much
 attntion on the normality tests, so it wouldn't matter which one/ones
 I use to report... but rather focus on issues such as the power of the
 test...

If doing regression I assume your normality tests were on residuals 
rather than raw data.

Frank

 
 Thanks again.
 
 On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]

 From: [EMAIL PROTECTED]
 
  On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
   [EMAIL PROTECTED] wrote:
Hi all,
   
apologies for seeking advice on a general stats question. I ve run

normality tests using 8 different methods:
- Lilliefors
- Shapiro-Wilk
- Robust Jarque Bera
- Jarque Bera
- Anderson-Darling
- Pearson chi-square
- Cramer-von Mises
- Shapiro-Francia
   
All show that the null hypothesis that the data come from a normal

distro cannot be rejected. Great. However, I don't think
  it looks nice
to report the values of 8 different tests on a report. One note is

that my sample size is really tiny (less than 20
  independent cases).
Without wanting to start a flame war, are there any
  advices of which
one/ones would be more appropriate and should be reported
  (along with
a Q-Q plot). Thank you.
   
Regards,
   
  
   Wow - I have so many concerns with that approach that it's
  hard to know
   where to begin.  But first of all, why care about
  normality?  Why not
   use distribution-free methods?
  
   You should examine the power of the tests for n=20.  You'll probably

   find it's not good enough to reach a reliable conclusion.
 
  And wouldn't it be even worse if I used non-parametric tests?

 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.

 I guess the question is, what exactly do you want to do with the outcome
 of the normality tests?  If those are going to be used as basis for
 deciding which test(s) to do next, then I concur with Frank's
 reservation.

 Generally speaking, I do not find goodness-of-fit for distributions very
 useful, mostly for the reason that failure to reject the null is no
 evidence in favor of the null.  It's difficult for me to imagine why
 there's insufficient evidence to show that the data did not come from a
 normal distribution would be interesting.

 Andy


  
   Frank
  
  
   --
   Frank E Harrell Jr   Professor and Chair   School
  of Medicine
 Department of Biostatistics
  Vanderbilt University
  
 
 
  --
  yianni
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 


 
 --
 Notice:  This e-mail message, together with any
 attachments,...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread wssecn
 The normality of the residuals is important in the inference procedures for 
the classical linear regression model, and normality is very important in 
correlation analysis (second moment)...

Washington S. Silva

 Thank you all for your replies they have been more useful... well
 in my case I have chosen to do some parametric tests (more precisely
 correlation and linear regressions among some variables)... so it
 would be nice if I had an extra bit of support on my decisions... If I
 understood well from all your replies... I shouldn't pay s much
 attntion on the normality tests, so it wouldn't matter which one/ones
 I use to report... but rather focus on issues such as the power of the
 test...
 
 Thanks again.
 
 On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
   Most standard tests, such as t-tests and ANOVA, are fairly resistant to
  non-normalilty for significance testing. It's the sample means that have
  to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
  for normality prior to choosing a test statistic is generally not a good
  idea.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
  Sent: Friday, May 25, 2007 12:04 PM
  To: [EMAIL PROTECTED]; Frank E Harrell Jr
  Cc: r-help
  Subject: Re: [R] normality tests [Broadcast]
 
  From: [EMAIL PROTECTED]
  
   On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run
 
 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal
 
 distro cannot be rejected. Great. However, I don't think
   it looks nice
 to report the values of 8 different tests on a report. One note is
 
 that my sample size is really tiny (less than 20
   independent cases).
 Without wanting to start a flame war, are there any
   advices of which
 one/ones would be more appropriate and should be reported
   (along with
 a Q-Q plot). Thank you.

 Regards,

   
Wow - I have so many concerns with that approach that it's
   hard to know
where to begin.  But first of all, why care about
   normality?  Why not
use distribution-free methods?
   
You should examine the power of the tests for n=20.  You'll probably
 
find it's not good enough to reach a reliable conclusion.
  
   And wouldn't it be even worse if I used non-parametric tests?
 
  I believe what Frank meant was that it's probably better to use a
  distribution-free procedure to do the real test of interest (if there is
  one) instead of testing for normality, and then use a test that assumes
  normality.
 
  I guess the question is, what exactly do you want to do with the outcome
  of the normality tests?  If those are going to be used as basis for
  deciding which test(s) to do next, then I concur with Frank's
  reservation.
 
  Generally speaking, I do not find goodness-of-fit for distributions very
  useful, mostly for the reason that failure to reject the null is no
  evidence in favor of the null.  It's difficult for me to imagine why
  there's insufficient evidence to show that the data did not come from a
  normal distribution would be interesting.
 
  Andy
 
 
   
Frank
   
   
--
Frank E Harrell Jr   Professor and Chair   School
   of Medicine
  Department of Biostatistics
   Vanderbilt University
   
  
  
   --
   yianni
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
 
  
  --
  Notice:  This e-mail message, together with any
  attachments,...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 yianni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] normality tests [Broadcast]

2007-05-25 Thread Cody_Hamilton

You can also try validating your regression model via the bootstrap (the
validate() function in the Design library is very helpful).  To my mind
that would be much more reassuring than normality tests performed on twenty
residuals.

By the way, be careful with the correlation test - it's only good at
detecting linear relationships between two variables (i.e. not helpful for
detecting non-linear relationships).

Regards,
   -Cody

Cody Hamilton, PhD
Edwards Lifesciences


   
 [EMAIL PROTECTED] 
 m 
 Sent by:   To 
 [EMAIL PROTECTED] Lucke, Joseph F   
 at.math.ethz.ch   [EMAIL PROTECTED]
cc 
   r-help r-help@stat.math.ethz.ch   
 05/25/2007 11:23  Subject 
 AMRe: [R] normality tests [Broadcast] 
   
   
   
   
   
   




Thank you all for your replies they have been more useful... well
in my case I have chosen to do some parametric tests (more precisely
correlation and linear regressions among some variables)... so it
would be nice if I had an extra bit of support on my decisions... If I
understood well from all your replies... I shouldn't pay s much
attntion on the normality tests, so it wouldn't matter which one/ones
I use to report... but rather focus on issues such as the power of the
test...

Thanks again.

On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]

 From: [EMAIL PROTECTED]
 
  On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
   [EMAIL PROTECTED] wrote:
Hi all,
   
apologies for seeking advice on a general stats question. I ve run

normality tests using 8 different methods:
- Lilliefors
- Shapiro-Wilk
- Robust Jarque Bera
- Jarque Bera
- Anderson-Darling
- Pearson chi-square
- Cramer-von Mises
- Shapiro-Francia
   
All show that the null hypothesis that the data come from a normal

distro cannot be rejected. Great. However, I don't think
  it looks nice
to report the values of 8 different tests on a report. One note is

that my sample size is really tiny (less than 20
  independent cases).
Without wanting to start a flame war, are there any
  advices of which
one/ones would be more appropriate and should be reported
  (along with
a Q-Q plot). Thank you.
   
Regards,
   
  
   Wow - I have so many concerns with that approach that it's
  hard to know
   where to begin.  But first of all, why care about
  normality?  Why not
   use distribution-free methods?
  
   You should examine the power of the tests for n=20.  You'll probably

   find it's not good enough to reach a reliable conclusion.
 
  And wouldn't it be even worse if I used non-parametric tests?

 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.

 I guess the question is, what exactly do you want to do with the outcome
 of the normality tests?  If those are going to be used as basis for
 deciding which test(s) to do next, then I concur with Frank's
 reservation.

 Generally speaking, I do not find goodness-of-fit for distributions very
 useful, mostly for the reason that failure to reject the null is no
 evidence in favor of the null.  It's difficult for me to imagine why
 there's insufficient evidence to show that the data did not come from a
 normal distribution would be interesting.

 Andy


  
   Frank
  
  
   --
   Frank E Harrell Jr   Professor and Chair   School
  of Medicine

Re: [R] normality tests [Broadcast]

2007-05-25 Thread Cody_Hamilton

Following up on Frank's thought, why is it that parametric tests are so
much more popular than their non-parametric counterparts?  As
non-parametric tests require fewer assumptions, why aren't they the
default?  The relative efficiency of the Wilcoxon test as compared to the
t-test is 0.955, and yet I still see t-tests in the medical literature all
the time.  Granted, the Wilcoxon still requires the assumption of symmetry
(I'm curious as to why the Wilcoxon is often used when asymmetry is
suspected, since the Wilcoxon assumes symmetry), but that's less stringent
than requiring normally distributed data.  In a similar vein, one usually
sees the mean and standard deviation reported as summary statistics for a
continuous variable - these are not very informative unless you assume the
variable is normally distributed.  However, clinicians often insist that I
included these figures in reports.

Cody Hamilton, PhD
Edwards Lifesciences



   
 Frank E Harrell   
 Jr
 [EMAIL PROTECTED]  To 
 bilt.edu Lucke, Joseph F   
 Sent by:  [EMAIL PROTECTED]
 [EMAIL PROTECTED]  cc 
 at.math.ethz.ch   r-help r-help@stat.math.ethz.ch   
   Subject 
   Re: [R] normality tests 
 05/25/2007 02:42  [Broadcast] 
 PM
   
   
   
   
   




Lucke, Joseph F wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea.

I beg to differ Joseph.  I have had many datasets in which the CLT was
of no use whatsoever, i.e., where bootstrap confidence limits were
asymmetric because the data were so skewed, and where symmetric
normality-based confidence intervals had bad coverage in both tails
(though correct on the average).  I see this the opposite way:
nonparametric tests works fine if normality holds.

Note that the CLT helps with type I error but not so much with type II
error.

Frank


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]

 From: [EMAIL PROTECTED]
 On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run

 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal

 distro cannot be rejected. Great. However, I don't think
 it looks nice
 to report the values of 8 different tests on a report. One note is

 that my sample size is really tiny (less than 20
 independent cases).
 Without wanting to start a flame war, are there any
 advices of which
 one/ones would be more appropriate and should be reported
 (along with
 a Q-Q plot). Thank you.

 Regards,

 Wow - I have so many concerns with that approach that it's
 hard to know
 where to begin.  But first of all, why care about
 normality?  Why not
 use distribution-free methods?

 You should examine the power of the tests for n=20.  You'll probably

 find it's not good enough to reach a reliable conclusion.
 And wouldn't it be even worse if I used non-parametric tests?

 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.

 I guess the question is, what exactly do you want to do with the outcome
 of the normality tests?  If those are going to be used as basis for
 deciding which test(s) to do next, then I concur with Frank's
 reservation.

 Generally speaking, I do not find goodness-of-fit

Re: [R] normality tests [Broadcast]

2007-05-25 Thread Frank E Harrell Jr
[EMAIL PROTECTED] wrote:
 Following up on Frank's thought, why is it that parametric tests are so
 much more popular than their non-parametric counterparts?  As
 non-parametric tests require fewer assumptions, why aren't they the
 default?  The relative efficiency of the Wilcoxon test as compared to the
 t-test is 0.955, and yet I still see t-tests in the medical literature all
 the time.  Granted, the Wilcoxon still requires the assumption of symmetry
 (I'm curious as to why the Wilcoxon is often used when asymmetry is
 suspected, since the Wilcoxon assumes symmetry), but that's less stringent
 than requiring normally distributed data.  In a similar vein, one usually
 sees the mean and standard deviation reported as summary statistics for a
 continuous variable - these are not very informative unless you assume the
 variable is normally distributed.  However, clinicians often insist that I
 included these figures in reports.
 
 Cody Hamilton, PhD
 Edwards Lifesciences

Well said Cody, just want to add that Wilcoxon does not assume symmetry 
if you are interested in testing for stochastic ordering and not just 
for a mean.

Frank

 
 
 

  Frank E Harrell   
  Jr
  [EMAIL PROTECTED]  To 
  bilt.edu Lucke, Joseph F   
  Sent by:  [EMAIL PROTECTED]
  [EMAIL PROTECTED]  cc 
  at.math.ethz.ch   r-help r-help@stat.math.ethz.ch   
Subject 
Re: [R] normality tests 
  05/25/2007 02:42  [Broadcast] 
  PM





 
 
 
 
 Lucke, Joseph F wrote:
  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
 non-normalilty for significance testing. It's the sample means that have
 to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
 for normality prior to choosing a test statistic is generally not a good
 idea.
 
 I beg to differ Joseph.  I have had many datasets in which the CLT was
 of no use whatsoever, i.e., where bootstrap confidence limits were
 asymmetric because the data were so skewed, and where symmetric
 normality-based confidence intervals had bad coverage in both tails
 (though correct on the average).  I see this the opposite way:
 nonparametric tests works fine if normality holds.
 
 Note that the CLT helps with type I error but not so much with type II
 error.
 
 Frank
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, May 25, 2007 12:04 PM
 To: [EMAIL PROTECTED]; Frank E Harrell Jr
 Cc: r-help
 Subject: Re: [R] normality tests [Broadcast]

 From: [EMAIL PROTECTED]
 On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run
 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal
 distro cannot be rejected. Great. However, I don't think
 it looks nice
 to report the values of 8 different tests on a report. One note is
 that my sample size is really tiny (less than 20
 independent cases).
 Without wanting to start a flame war, are there any
 advices of which
 one/ones would be more appropriate and should be reported
 (along with
 a Q-Q plot). Thank you.

 Regards,

 Wow - I have so many concerns with that approach that it's
 hard to know
 where to begin.  But first of all, why care about
 normality?  Why not
 use distribution-free methods?

 You should examine the power of the tests for n=20.  You'll probably
 find it's not good enough to reach a reliable conclusion.
 And wouldn't it be even worse if I used non-parametric tests?
 I believe what Frank meant was that it's probably better to use a
 distribution-free procedure to do the real test of interest (if there is
 one) instead of testing for normality, and then use a test that assumes
 normality.

 I guess the question is, what exactly do you want