As a matter of fact, I would say both Bert and I encounter "designed experiments" a lot more than "observational studies", yet we speak from experience that those things that Bert mentioned happen on a daily basis. When you talk to experimenters, ask your questions carefully and you'll see these things crop up.
Andy -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Monday, August 02, 2010 3:35 PM To: Bert Gunter Cc: r-help@r-project.org; wwreith Subject: Re: [R] Problems with normality req. for ANOVA In a general situation of observational studies, your point is undoubtedly true, and apparently you believe it to be true even in the setting of designed experiments. Perhaps I should have confined myself to my first sentence. -- David. On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote: > David et. al: > > I take issue with this. It is the lack of independence that is the > major issue. In particular, clustering, split-plotting, and so forth > due to "convenience order" experimentation, lack of randomization, > exogenous effects like the systematic effects due to measurement > method/location have the major effect on inducing bias and > distorting inference. Normality and unequal variances typically pale > to insignificance compared to this. > > Obviously, IMHO. > > Note 1: George Box noted this at least 50 years ago in the early > '60's when he and Jenkins developed arima modeling. > > Note 2: If you can, have a look at Jack Youden's classic paper > "Enduring Values", which comments to some extent on these issues, > here: http://www.jstor.org/pss/1266913 > > Cheers, > Bert > > > Bert Gunter > Genentech Nonclinical Biostatistics > > > > On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <dwinsem...@comcast.net > > wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > > I am conducting an experiment with four independent variables each > of which > has three or more factor levels. The sample size is quite large i.e. > several > thousand. The dependent variable data does not pass a normality test > but > "visually" looks close to normal so is there a way to compute the > affect > this would have on the p-value for ANOVA or is there a way to > perform an > nonparametric test in R that will handle this many independent > variables. > Simply saying ANOVA is robust to small departures from normality is > not > going to be good enough for my client. > > The statistical assumption of normality for linear models do not > apply to the distribution of the dependent variable, but rather to > the residuals after a model is estimated. Furthermore, it is the > homoskedasticity assumption that is more commonly violated and also > greater threat to validity. (And if you don't already know both of > these points, then you desperately need to review your basic > modeling practices.) > > > I need to compute an error amount for > ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of > that question in unambiguous terminology. What is "error amount"? > > For the second part, there is an entire Task View on Robust > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > > > David Winsemius, MD West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.