Re: [R] Testing for normality of residuals in a regression model

2004-10-18 Thread Thomas Lumley
On Fri, 15 Oct 2004, Kjetil Brinchmann Halvorsen wrote:
Liaw, Andy wrote:
Also, I was told by someone very smart that fitting OLS to data with
heteroscedastic errors can make the residuals look `more normal' than they
really are...  Don't know how true that is, though. 
Certainly true, since the residuals will be a kind of average, so the CLT 
works.
[Inserting some R content into the discussion]
An example of this can be seen by running qqnorm on the residuals from the 
Anscombe quartet of data sets (data(anscombe)).

-thomas
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-16 Thread Federico Gherardini
Prof Brian Ripley wrote:
However, stats 901 or some such tells you that if the distributions have 
even slightly longer tails than the normal you can get much better 
estimates than OLS, and this happens even before a test of normality 
rejects on a sample size of thousands.

Robustness of efficiency is much more important than robustness of 
distribution, and I believe robustness concepts should be in stats 101.
(I was teaching them yesterday in the third lecture of a basic course, 
albeit a graduate course.)
   

This is a very interesting discussion. So you are basically saying that 
it's better to use robust regression methods, without having to worry 
too much about the distribution of residuals, instead of using standard 
methods and doing a lot of tests to check for normality? Did I get your 
point?

Cheers,
Federico
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-16 Thread Philippe Grosjean
 Prof Brian Ripley wrote:
 
 However, stats 901 or some such tells you that if the distributions 
 have even slightly longer tails than the normal you can get much 
 better estimates than OLS, and this happens even before a test of 
 normality rejects on a sample size of thousands.
 
 Robustness of efficiency is much more important than robustness of 
 distribution, and I believe robustness concepts should be 
 in stats 101.
 (I was teaching them yesterday in the third lecture of a 
 basic course, 
 albeit a graduate course.)
 
 

Federico Gherardini answered:
 This is a very interesting discussion. So you are basically 
 saying that it's better to use robust regression methods, 
 without having to worry too much about the distribution of 
 residuals, instead of using standard methods and doing a lot 
 of tests to check for normality? Did I get your point?

My feeling is that symmetry is more important than, let's say kurtosis  0
in the error. Is this correct? Now the problem is: the lower number of
observations, the more severe an effect of non-normality (at least,
asymmetry?) could be on the regression AND at the same time, power of tests
to detect non normality drops. So, I can imagine easily situations where
non-normality is not detected, yet asymmetry is such that regression is
significantly biased... It is mainly a question of sample size from this
point of view... But not only:

Andy Liaw wrote:
 Also, I was told by someone very smart that fitting OLS to
 data with heteroscedastic errors can make the residuals look
 `more normal' than they really are...  Don't know how true
 that is, though. 

That very smart person is not me, but it happens that I experimented also a
little bit on this a while ago! Just experiment with artificial data, and
you will see what happens: residuals look often more normal that the error
distribution you introduced in your artificial data... Another consequence,
is a biased estimate of parameters. Indeed, both come together: parameters
are biased in a direction that lowers residuals sum of square, obviously,
but also in some circumstances, in a direction that make residuals looking
more normal... And that is not (how can it be?) taken into account in the
test of normality. That is, I believe, a second reason why non-normality of
error could not be detected, yet it has a major impact on the OLS
regression.

And I am pretty sure there are other reasons, like distribution of error
both in the dependent and in the independent variables, another violation of
the assumptions made for OLS...

Best regards,

Philippe

..°}))
 ) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons-Hainaut University, Pentagone
( ( ( ( (Academie Universitaire Wallonie-Bruxelles
 ) ) ) ) )   6, av du Champ de Mars, 7000 Mons, Belgium  
( ( ( ( (   
 ) ) ) ) )   phone: + 32.65.37.34.97, fax: + 32.65.37.33.12
( ( ( ( (email: [EMAIL PROTECTED]
 ) ) ) ) )  
( ( ( ( (web:   http://www.umh.ac.be/~econum
 ) ) ) ) )
..

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-16 Thread Prof Brian Ripley
I am assuming everyone is on R-help and doesn't want two copies so have 
trimmed the Cc: list to R-help.

On Sat, 16 Oct 2004, Philippe Grosjean wrote:

  Prof Brian Ripley wrote:

[ Other contributions previously excised here without comment. ]

  However, stats 901 or some such tells you that if the distributions 
  have even slightly longer tails than the normal you can get much 
  better estimates than OLS, and this happens even before a test of 
  normality rejects on a sample size of thousands.
  
  Robustness of efficiency is much more important than robustness of 
  distribution, and I believe robustness concepts should be 
  in stats 101.
  (I was teaching them yesterday in the third lecture of a  basic course, 
  albeit a graduate course.)
 
 Federico Gherardini answered:
  This is a very interesting discussion. So you are basically 
  saying that it's better to use robust regression methods, 
  without having to worry too much about the distribution of 
  residuals, instead of using standard methods and doing a lot 
  of tests to check for normality? Did I get your point?
 
 My feeling is that symmetry is more important than, let's say kurtosis  0
 in the error. Is this correct? Now the problem is: the lower number of
 observations, the more severe an effect of non-normality (at least,
 asymmetry?) could be on the regression AND at the same time, power of tests
 to detect non normality drops. So, I can imagine easily situations where
 non-normality is not detected, yet asymmetry is such that regression is
 significantly biased... 

Before you can even talk about bias you have to agree what it is you are
trying to estimate.  For asymmetric error distributions it is unlikely to
be the population mean, but if it is then least-squares linear regression
is unbiased provided only that the error distribution has a finite first
moment.  (Part of the so-called Gauss-Markov Theorem.  This seems to
suggest that Philippe's `easy imagination' is of impossible things.)

For contaminated normal distributions it is possibly the mean of the
uncontaminated normal component, and the latter seems the commonest aim of
mainstream robust methods, which do often assume symmetry.  (This may not
affect interpretation of coefficients other than the intercept.)  The
(non-linear) robust regression estimators may be biased for the population
mean but have a (much) smaller variability for long-tailed distributions.

There is a lot of careful discussion about this in the statistical
literature, and I don't believe that it is profitable for people to be
discussing this without knowing the literature, and probably not _here_
even then.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Stefano Calza
What about shapiro.test(resid(fit.object))

Stefano

On Fri, Oct 15, 2004 at 02:44:18PM +0200, Federico Gherardini wrote:
 Hi all,
 
 Is it possible to have a test value for assessing the normality of 
 residuals from a linear regression model, instead of simply relying on 
 qqplots?
 I've tried to use fitdistr to try and fit the residuals with a normal 
 distribution, but fitdsitr only returns the parameters of the 
 distribution and the standard errors, not the p-value. Am I missing 
 something?
 
 Cheers,
 
 Federico
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Dimitris Rizopoulos
Hi Frederico,
take also a look at the package nortest:
help(package=nortest)
Best,
Dimitris

Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm.
- Original Message - 
From: Federico Gherardini [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, October 15, 2004 2:44 PM
Subject: [R] Testing for normality of residuals in a regression model


Hi all,
Is it possible to have a test value for assessing the normality of 
residuals from a linear regression model, instead of simply relying 
on qqplots?
I've tried to use fitdistr to try and fit the residuals with a 
normal distribution, but fitdsitr only returns the parameters of the 
distribution and the standard errors, not the p-value. Am I missing 
something?

Cheers,
Federico
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread John Fox
Dear Federico,

A problem with applying a standard test of normality to LS residuals is that
the residuals are correlated and heterskedastic even if the standard
assumptions of the model hold. In a large sample, this is unlikely to be
problematic (unless there's an unusual data configuration), but in a small
sample the effect could be nontrivial.

One approach is to use BLUS residuals, which transform the LS residuals to a
smaller set of uncorrelated, homoskedastic residuals (assuming the
correctness of the model). A search of R resources didn't turn up anything
for BLUS, but they shouldn't be hard to compute. This is a standard topic
covered in many econometrics texts.

You might consider the alternative of generating a bootstrapped confidence
envelope for the QQ plot; the qq.plot() function in the car package will do
this for a linear model.

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Federico Gherardini
 Sent: Friday, October 15, 2004 7:44 AM
 To: [EMAIL PROTECTED]
 Subject: [R] Testing for normality of residuals in a regression model
 
 Hi all,
 
 Is it possible to have a test value for assessing the 
 normality of residuals from a linear regression model, 
 instead of simply relying on qqplots?
 I've tried to use fitdistr to try and fit the residuals with 
 a normal distribution, but fitdsitr only returns the 
 parameters of the distribution and the standard errors, not 
 the p-value. Am I missing something?
 
 Cheers,
 
 Federico
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Federico Gherardini
Thank you very much for your suggestions! The residuals come from a gls 
model, because I had to correct for heteroscedasticity using a weighted 
regression... can I simply apply one of these tests (like shapiro.test) 
to the standardized residuals from my gls model?

Cheers,
Federico
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Kjetil Brinchmann Halvorsen
John Fox wrote:
Dear Federico,
A problem with applying a standard test of normality to LS residuals is that
the residuals are correlated and heterskedastic even if the standard
assumptions of the model hold. In a large sample, this is unlikely to be
problematic (unless there's an unusual data configuration), but in a small
sample the effect could be nontrivial.
One approach is to use BLUS residuals, which transform the LS residuals to a
smaller set of uncorrelated, homoskedastic residuals (assuming the
correctness of the model).
I'm not sure if this are BLUE residuals, but the following function 
transform to a
smaller set of independent, homoscedastic residuals and the calls 
shapiro.test:
I've proposed to make this a method for shapiro.test for lm objects, 
but it is
not accepted.

shapiro.test.lm
function (obj)
{
   eff - effects(obj)
   rank - obj$rank
   df.r - obj$df.residual
   if (df.r  3)
   stop(To few degrees of freedom for residual for the test.)
   data.name - deparse(substitute(obj))
   x - eff[-(1:rank)]
   res - shapiro.test(x)
   res$data.name - data.name
   res$method - paste(res$method,  for residuals of linear model)
   res
}
Kjetil

A search of R resources didn't turn up anything
for BLUS, but they shouldn't be hard to compute. This is a standard topic
covered in many econometrics texts.
You might consider the alternative of generating a bootstrapped confidence
envelope for the QQ plot; the qq.plot() function in the car package will do
this for a linear model.
I hope this helps,
John

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
Federico Gherardini
Sent: Friday, October 15, 2004 7:44 AM
To: [EMAIL PROTECTED]
Subject: [R] Testing for normality of residuals in a regression model

Hi all,
Is it possible to have a test value for assessing the 
normality of residuals from a linear regression model, 
instead of simply relying on qqplots?
I've tried to use fitdistr to try and fit the residuals with 
a normal distribution, but fitdsitr only returns the 
parameters of the distribution and the standard errors, not 
the p-value. Am I missing something?

Cheers,
Federico
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
   

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Kjetil Brinchmann Halvorsen
John Fox wrote:
Dear Federico,
A problem with applying a standard test of normality to LS residuals is that
the residuals are correlated and heterskedastic even if the standard
assumptions of the model hold. In a large sample, this is unlikely to be
problematic (unless there's an unusual data configuration), but in a small
sample the effect could be nontrivial.
One approach is to use BLUS residuals, which transform the LS residuals to a
smaller set of uncorrelated, homoskedastic residuals (assuming the
correctness of the model). A search of R resources didn't turn up anything
for BLUS, but they shouldn't be hard to compute. This is a standard topic
covered in many econometrics texts.
You might consider the alternative of generating a bootstrapped confidence
envelope for the QQ plot; the qq.plot() function in the car package will do
this for a linear model.
I hope this helps,
John

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
Federico Gherardini
Sent: Friday, October 15, 2004 7:44 AM
To: [EMAIL PROTECTED]
Subject: [R] Testing for normality of residuals in a regression model

Hi all,
Is it possible to have a test value for assessing the 
normality of residuals from a linear regression model, 
instead of simply relying on qqplots?
I've tried to use fitdistr to try and fit the residuals with 
a normal distribution, but fitdsitr only returns the 
parameters of the distribution and the standard errors, not 
the p-value. Am I missing something?

Cheers,
Federico
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
   

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Federico Gherardini
Berton Gunter wrote:
Quite right, John!
I have 2 additional questions:
1) Why test for normality of residuals? Suppose you reject -- then what?
(residual plots may give information on skewness, multi-modality, data
anomalies that can affect the data analysis).
 

Because I want to know if my model satisfies the basic assumptions of 
regression theory... in other words I want to know if I can trust my 
model.

Cheers,
Federico
2) Why test for normality? Is it EVER useful? Suppose you reject -- then
what?
(I am tempted to add a 3rd question -- why test at all? -- but that is
perhaps too iconoclastic and certainly off topic. Let the hounds remain
leashed for now.)
Cheers,
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Federico Gherardini
Berton Gunter wrote:
Exactly! My point is that normality tests are useless for this purpose for
reasons that are beyond what I can take up here. 

Thanks for your suggestions, I undesrtand that! Could you possibly give 
me some (not too complicated!)
links so that I can investigate this matter further?

Cheers,
Federico
Hints: Balanced designs are
robust to non-normality; independence (especially clustering of subjects
due to systematic effects), not normality is usually the biggest real
statistical problem; hypothesis tests will always reject when samples are
large -- so what!; trust refers to prediction validity which has to do
with study design and the validity/representativeness of the current data to
future. 

I know that all the stats 101 tests say to test for normality, but they're
full of baloney!
Of course, this is free advice -- so caveat emptor!
Cheers,
Bert
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Berton Gunter

 
 Berton Gunter wrote:
 
 Exactly! My point is that normality tests are useless for 
 this purpose for
 reasons that are beyond what I can take up here. 
 
 Thanks for your suggestions, I undesrtand that! Could you 
 possibly give 
 me some (not too complicated!)
 links so that I can investigate this matter further?
 
 Cheers,
 
 Federico


1. This was meant as a private reply so I would not roil the list. In
future, when a reply takes a discussion off list, you should keep it off
list, please.

2. The writings of (and personal conversations with) John Tukey and George
Box are certainly primary influences, as are numerous other commentaries
over the year from folks like Leo Breiman, Jerry Friedman, David Freedman,
Persi Diaconis and many others. Box's original paper about robustness to
non-normality was around 1952, I think, but much of what I allude to is
statistical folklore, I think. Perhaps other list contributors might give
you some better specific references. 

Cheers,
Bert

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Liaw, Andy
Let's see if I can get my stat 101 straight:

We learned that linear regression has a set of assumptions:

1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.

Now, we should ask:  Why are they needed?  Can we get away with less?  What
if some of them are not met?

It should be clear why we need #1.

Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.

Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be.  Interval estimates for the prediction will surely be
wrong.

Without #4, well, it depends.  If the residual DF is sufficiently large, the
t-tests are still valid because of CLT.  You do need normality if you have
small residual DF.

The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help.  There's
no free lunch:  A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test).  How do you decide what to use?

Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?

Cheers,
Andy

 From: Federico Gherardini
 
 Berton Gunter wrote:
 
 Exactly! My point is that normality tests are useless for 
 this purpose for
 reasons that are beyond what I can take up here. 
 
 Thanks for your suggestions, I undesrtand that! Could you 
 possibly give 
 me some (not too complicated!)
 links so that I can investigate this matter further?
 
 Cheers,
 
 Federico
 
 Hints: Balanced designs are
 robust to non-normality; independence (especially 
 clustering of subjects
 due to systematic effects), not normality is usually the 
 biggest real
 statistical problem; hypothesis tests will always reject 
 when samples are
 large -- so what!; trust refers to prediction validity 
 which has to do
 with study design and the validity/representativeness of 
 the current data to
 future. 
 
 I know that all the stats 101 tests say to test for 
 normality, but they're
 full of baloney!
 
 Of course, this is free advice -- so caveat emptor!
 
 Cheers,
 Bert
 
   
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread John Fox
Dear Kjetil,

I don't believe that these are BLUS residuals, but since the last n - r
effects are projections onto an orthogonal basis for the residual
subspace, they should do just fine (as long as the basis vectors have the
same length, which I think is the case, but perhaps someone can confirm).
The general idea is to transform the LS residuals into an uncorrelated,
equal-variance set.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: Kjetil Brinchmann Halvorsen [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 15, 2004 9:12 AM
 To: John Fox
 Cc: 'Federico Gherardini'; [EMAIL PROTECTED]
 Subject: Re: [R] Testing for normality of residuals in a 
 regression model
 
 John Fox wrote:
 
 Dear Federico,
 
 A problem with applying a standard test of normality to LS 
 residuals is 
 that the residuals are correlated and heterskedastic even if the 
 standard assumptions of the model hold. In a large sample, this is 
 unlikely to be problematic (unless there's an unusual data 
 configuration), but in a small sample the effect could be nontrivial.
 
 One approach is to use BLUS residuals, which transform the 
 LS residuals 
 to a smaller set of uncorrelated, homoskedastic residuals 
 (assuming the 
 correctness of the model).
 
 I'm not sure if this are BLUE residuals, but the following 
 function transform to a smaller set of independent, 
 homoscedastic residuals and the calls
 shapiro.test:
 I've proposed to make this a method for shapiro.test for lm 
 objects, but it is not accepted.
 
  shapiro.test.lm
 function (obj)
 {
 eff - effects(obj)
 rank - obj$rank
 df.r - obj$df.residual
 if (df.r  3)
 stop(To few degrees of freedom for residual for the test.)
 data.name - deparse(substitute(obj))
 x - eff[-(1:rank)]
 res - shapiro.test(x)
 res$data.name - data.name
 res$method - paste(res$method,  for residuals of linear model)
 res
 }
 
 Kjetil
 
 
  A search of R resources didn't turn up anything for BLUS, but they 
 shouldn't be hard to compute. This is a standard topic 
 covered in many 
 econometrics texts.
 
 You might consider the alternative of generating a bootstrapped 
 confidence envelope for the QQ plot; the qq.plot() function 
 in the car 
 package will do this for a linear model.
 
 I hope this helps,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox
 
 
   
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Federico 
 Gherardini
 Sent: Friday, October 15, 2004 7:44 AM
 To: [EMAIL PROTECTED]
 Subject: [R] Testing for normality of residuals in a 
 regression model
 
 Hi all,
 
 Is it possible to have a test value for assessing the normality of 
 residuals from a linear regression model, instead of simply 
 relying on 
 qqplots?
 I've tried to use fitdistr to try and fit the residuals 
 with a normal 
 distribution, but fitdsitr only returns the parameters of the 
 distribution and the standard errors, not the p-value. Am I missing 
 something?
 
 Cheers,
 
 Federico
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
   
 
 
 
 -- 
 
 Kjetil Halvorsen.
 
 Peace is the most effective weapon of mass construction.
--  Mahdi Elmandjra
 
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread John Fox
Dear Federico,

The problem is the same with GLS residuals -- even if the GLS transformation
produces homoskedastic errors, the residuals will be correlated and
heteroskedastic (with this problem tending to disappear in most instances as
n grows). The central point is that residuals don't behave quite the same as
errors.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Federico Gherardini
 Sent: Friday, October 15, 2004 11:22 AM
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Testing for normality of residuals in a 
 regression model
 
 Thank you very much for your suggestions! The residuals come 
 from a gls model, because I had to correct for 
 heteroscedasticity using a weighted regression... can I 
 simply apply one of these tests (like shapiro.test) to the 
 standardized residuals from my gls model?
 
 Cheers,
 Federico
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread John Fox
Dear Andy,

At the risk of muddying the waters (and certainly without wanting to
advocate the use of normality tests for residuals), I believe that your
point #4 is subject to misinterpretation: That is, while it is true that t-
and F-tests for regression coefficients in large sample retain their
validity well when the errors are non-normal, the efficiency of the LS
estimates can (depending upon the nature of the non-normality) be seriously
compromised, not only absolutely but in relation to alternatives (e.g.,
robust regression).

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
 Sent: Friday, October 15, 2004 11:55 AM
 To: 'Federico Gherardini'; Berton Gunter
 Cc: R-help mailing list
 Subject: RE: [R] Testing for normality of residuals in a 
 regression model
 
 Let's see if I can get my stat 101 straight:
 
 We learned that linear regression has a set of assumptions:
 
 1. Linearity of the relationship between X and y.
 2. Independence of errors.
 3. Homoscedasticity (equal error variance).
 4. Normality of errors.
 
 Now, we should ask:  Why are they needed?  Can we get away 
 with less?  What if some of them are not met?
 
 It should be clear why we need #1.
 
 Without #2, I believe the least squares estimator is still 
 unbias, but the usual estimate of SEs for the coefficients 
 are wrong, so the t-tests are wrong.
 
 Without #3, the coefficients are, again, still unbiased, but 
 not as efficient as can be.  Interval estimates for the 
 prediction will surely be wrong.
 
 Without #4, well, it depends.  If the residual DF is 
 sufficiently large, the t-tests are still valid because of 
 CLT.  You do need normality if you have small residual DF.
 
 The problem with normality tests, I believe, is that they 
 usually have fairly low power at small sample sizes, so that 
 doesn't quite help.  There's no free lunch:  A normality test 
 with good power will usually have good power against a fairly 
 narrow class of alternatives, and almost no power against 
 others (directional test).  How do you decide what to use?
 
 Has anyone seen a data set where the normality test on the 
 residuals is crucial in coming up with appriate analysis?
 
 Cheers,
 Andy
 
  From: Federico Gherardini
  
  Berton Gunter wrote:
  
  Exactly! My point is that normality tests are useless for
  this purpose for
  reasons that are beyond what I can take up here. 
  
  Thanks for your suggestions, I undesrtand that! Could you possibly 
  give me some (not too complicated!) links so that I can investigate 
  this matter further?
  
  Cheers,
  
  Federico
  
  Hints: Balanced designs are
  robust to non-normality; independence (especially
  clustering of subjects
  due to systematic effects), not normality is usually the
  biggest real
  statistical problem; hypothesis tests will always reject
  when samples are
  large -- so what!; trust refers to prediction validity
  which has to do
  with study design and the validity/representativeness of
  the current data to
  future. 
  
  I know that all the stats 101 tests say to test for
  normality, but they're
  full of baloney!
  
  Of course, this is free advice -- so caveat emptor!
  
  Cheers,
  Bert
  

  
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Liaw, Andy
Hi John,

Your point is well taken.  I was only thinking about the shape of the
distribution, and neglected the cases of, say, symmetric long tailed
distributions.  However, I think I'd still argue that other tools are
probably more useful than normality tests (e.g., robust methods, as you
mentioned).

To take the point a bit further, let's say we test for normality and it's
rejected.  What do we do then?  Well, if the non-normality is caused by
outliers, we can try robust methods.  If not, what do we do?  We can try to
see if some sort of transformation would bring the residuals closer to
normally distributed, but if the interest is in inference on the
coefficients, those inferences on the `final' model are potentially invalid.
What's one to do then?

Also, I was told by someone very smart that fitting OLS to data with
heteroscedastic errors can make the residuals look `more normal' than they
really are...  Don't know how true that is, though.

Best,
Andy

 From: John Fox
 
 Dear Andy,
 
 At the risk of muddying the waters (and certainly without wanting to
 advocate the use of normality tests for residuals), I believe 
 that your
 point #4 is subject to misinterpretation: That is, while it 
 is true that t-
 and F-tests for regression coefficients in large sample retain their
 validity well when the errors are non-normal, the efficiency of the LS
 estimates can (depending upon the nature of the 
 non-normality) be seriously
 compromised, not only absolutely but in relation to 
 alternatives (e.g.,
 robust regression).
 
 Regards,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
  Sent: Friday, October 15, 2004 11:55 AM
  To: 'Federico Gherardini'; Berton Gunter
  Cc: R-help mailing list
  Subject: RE: [R] Testing for normality of residuals in a 
  regression model
  
  Let's see if I can get my stat 101 straight:
  
  We learned that linear regression has a set of assumptions:
  
  1. Linearity of the relationship between X and y.
  2. Independence of errors.
  3. Homoscedasticity (equal error variance).
  4. Normality of errors.
  
  Now, we should ask:  Why are they needed?  Can we get away 
  with less?  What if some of them are not met?
  
  It should be clear why we need #1.
  
  Without #2, I believe the least squares estimator is still 
  unbias, but the usual estimate of SEs for the coefficients 
  are wrong, so the t-tests are wrong.
  
  Without #3, the coefficients are, again, still unbiased, but 
  not as efficient as can be.  Interval estimates for the 
  prediction will surely be wrong.
  
  Without #4, well, it depends.  If the residual DF is 
  sufficiently large, the t-tests are still valid because of 
  CLT.  You do need normality if you have small residual DF.
  
  The problem with normality tests, I believe, is that they 
  usually have fairly low power at small sample sizes, so that 
  doesn't quite help.  There's no free lunch:  A normality test 
  with good power will usually have good power against a fairly 
  narrow class of alternatives, and almost no power against 
  others (directional test).  How do you decide what to use?
  
  Has anyone seen a data set where the normality test on the 
  residuals is crucial in coming up with appriate analysis?
  
  Cheers,
  Andy
  
   From: Federico Gherardini
   
   Berton Gunter wrote:
   
   Exactly! My point is that normality tests are useless for
   this purpose for
   reasons that are beyond what I can take up here. 
   
   Thanks for your suggestions, I undesrtand that! Could you 
 possibly 
   give me some (not too complicated!) links so that I can 
 investigate 
   this matter further?
   
   Cheers,
   
   Federico
   
   Hints: Balanced designs are
   robust to non-normality; independence (especially
   clustering of subjects
   due to systematic effects), not normality is usually the
   biggest real
   statistical problem; hypothesis tests will always reject
   when samples are
   large -- so what!; trust refers to prediction validity
   which has to do
   with study design and the validity/representativeness of
   the current data to
   future. 
   
   I know that all the stats 101 tests say to test for
   normality, but they're
   full of baloney!
   
   Of course, this is free advice -- so caveat emptor!
   
   Cheers,
   Bert
   
 
   
   
   __
   [EMAIL PROTECTED] mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
   
  
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  

Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Spencer Graves
 OK, I'll expose myself: 

 I tend to do normal probability plots of residuals (usely deletion 
/ studentized residuals as described by Venables and Ripley in Modern 
Applied Statistics with S, 4th ed, MASS4).  If the plots look strange, I 
do something.  I'll check apparent outliers for coding and data entry 
errors, and I often delete those points from the analysis even if I 
can't find a reason why.  Robust regression will usually handle this 
type of problem, and I am gradually migrating to increasing use of 
robust regression, especially the procedures recommended by MASS4.  . 

 However, I recently encountered a situation that would be masked 
by standard use of robust regression without examining residual plots:  
A normal probability plot looked like three parallel straight lines with 
gaps, suggesting a mixture of 3 normal distributions with different 
means and a common standard deviation.  Further investigation revealed 
that an important 3-level explanatory variable that had been miscoded.  
When this was corrected, that variable entered the model and the gaps in 
the normal plot disappeared. 

 I tend NOT to use tests of normality for the reasons Andy 
mentioned.  Instead, I do various kinds of diagnostic plots and modify 
my model or investigate the data in response to what I see. 

 Comments?
 hope this helps.  spencer graves
Liaw, Andy wrote:
Let's see if I can get my stat 101 straight:
We learned that linear regression has a set of assumptions:
1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.
Now, we should ask:  Why are they needed?  Can we get away with less?  What
if some of them are not met?
It should be clear why we need #1.
Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.
Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be.  Interval estimates for the prediction will surely be
wrong.
Without #4, well, it depends.  If the residual DF is sufficiently large, the
t-tests are still valid because of CLT.  You do need normality if you have
small residual DF.
The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help.  There's
no free lunch:  A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test).  How do you decide what to use?
Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?
Cheers,
Andy
 

From: Federico Gherardini
Berton Gunter wrote:
   

Exactly! My point is that normality tests are useless for 
 

this purpose for
   

reasons that are beyond what I can take up here. 

 

Thanks for your suggestions, I undesrtand that! Could you 
possibly give 
me some (not too complicated!)
links so that I can investigate this matter further?

Cheers,
Federico
   

Hints: Balanced designs are
robust to non-normality; independence (especially 
 

clustering of subjects
   

due to systematic effects), not normality is usually the 
 

biggest real
   

statistical problem; hypothesis tests will always reject 
 

when samples are
   

large -- so what!; trust refers to prediction validity 
 

which has to do
   

with study design and the validity/representativeness of 
 

the current data to
   

future. 

I know that all the stats 101 tests say to test for 
 

normality, but they're
   

full of baloney!
Of course, this is free advice -- so caveat emptor!
Cheers,
Bert


 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

   

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing for normality of residuals in a regression model

2004-10-15 Thread Kjetil Brinchmann Halvorsen
Liaw, Andy wrote:
.
.
.
.
Also, I was told by someone very smart that fitting OLS to data with
heteroscedastic errors can make the residuals look `more normal' than they
really are...  Don't know how true that is, though.  
 

Certainly true, since the residuals will be a kind of average, so the 
CLT works.
(Think that is in Seber, Linear Regression Analysis, 1977)

Kjetil
Best,
Andy
 

--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html