Re: [R] Testing for normality of residuals in a regression model

Spencer Graves Fri, 15 Oct 2004 11:33:41 -0700

OK, I'll expose myself:

I tend to do normal probability plots of residuals (usely deletion / studentized residuals as described by Venables and Ripley in Modern Applied Statistics with S, 4th ed, MASS4). If the plots look strange, I do something. I'll check apparent outliers for coding and data entry errors, and I often delete those points from the analysis even if I can't find a reason why. Robust regression will usually handle this type of problem, and I am gradually migrating to increasing use of robust regression, especially the procedures recommended by MASS4. .

However, I recently encountered a situation that would be masked by standard use of robust regression without examining residual plots: A normal probability plot looked like three parallel straight lines with gaps, suggesting a mixture of 3 normal distributions with different means and a common standard deviation. Further investigation revealed that an important 3-level explanatory variable that had been miscoded. When this was corrected, that variable entered the model and the gaps in the normal plot disappeared.

I tend NOT to use tests of normality for the reasons Andy mentioned. Instead, I do various kinds of diagnostic plots and modify my model or investigate the data in response to what I see.

     Comments?
     hope this helps.  spencer graves

Liaw, Andy wrote:

Let's see if I can get my stat 101 straight:
We learned that linear regression has a set of assumptions:
1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.
Now, we should ask:  Why are they needed?  Can we get away with less?  What
if some of them are not met?
It should be clear why we need #1.
Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.
Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be.  Interval estimates for the prediction will surely be
wrong.
Without #4, well, it depends.  If the residual DF is sufficiently large, the
t-tests are still valid because of CLT.  You do need normality if you have
small residual DF.
The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help.  There's
no free lunch:  A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test).  How do you decide what to use?
Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?
Cheers,
Andy
From: Federico Gherardini
Berton Gunter wrote:
Exactly! My point is that normality tests are useless for

this purpose for

reasons that are beyond what I can take up here.

Thanks for your suggestions, I undesrtand that! Could you possibly give me some (not too complicated!) links so that I can investigate this matter further?
Cheers,
Federico
Hints: Balanced designs are robust to non-normality; independence (especially

"clustering" of subjects

due to systematic effects), not normality is usually the

biggest real

statistical problem; hypothesis tests will always reject

when samples are

large -- so what!; "trust" refers to prediction validity

which has to do

with study design and the validity/representativeness of

the current data to

future.

I know that all the stats 101 tests say to test for

normality, but they're
full of baloney!
Of course, this is "free" advice -- so caveat emptor!
Cheers,
Bert
______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Testing for normality of residuals in a regression model

Reply via email to