Re, With this mail, I would like to praise for taking good care of ANOVA in the robust libraries in R. May I recall that in fields like psychology, more that 80% of the articles contain an ANOVA whereas less that 10% of them contain a regression. Doing statistical consulting with psychologists for 10 years, here are the points I find essential to convince psychologists to use R and robust procedures for ANOVA:
(A) If I understand correctly, lmrob will eventually supersede lmRob. I strongly suggest that a real reflection is taken concerning X variables that are factors. As I understand, lmrob cannot handle such data since the initial algorithm will very likely fail with such covariates. I suggest adding the possibility to have a L1-type of initial algorithm (or similar) with these covariates or to use separate initial algorithms for the continuous covariates and the factors, like in lmRob. (B) Provide so-called "Type III sums of squares" or effects tested marginally in the anova.lmrob or anova.lmRob (+ to implement anova.lmrob for only one model). I know it can be done by hand, but for an average user, having it as an optional argument to anova.lm(R/r)ob would be an important argument to use R and robust ANOVA. Since this is an extremely hot topic within the S/R community, I give below what I believe to be convincing arguments given by (other) prominent members of the statistics community. By the way, "marginal" or "Type III sums of squares" are in several important R libraries, like in the car library (used by Rcommander, see function Anova (with a capital A) with its type="III" argument) and the nlme library (see anova.lme with its type="marginal" argument). (C) Since ANOVA is so much used, why not write a small function aovrob that just call lmrob with the appropriate arguments for the initial algorithm and return anova.lmrob of the object, with marginal as the default value ? Arguments for "marginal" or "Type III sums of squares" (for part (B)) Some prominent members of the S/R community gave over the years many negatives comment "Type III sums of squares" or effect tested marginally. However their examples were often in regression (e.g. polynomial). In the context of unbalanced ANOVA, there are other prominent members of the statistics community that give extremely convincing arguments. The big difference comes from the fact that in almost all real examples, if the design is unbalanced, this is due to (hopefully MAR) missing values, and not due to an underlying population distribution that is unbalanced. On the contrary, in regression, the distribution of X is supposed to be fixed (or loosely speaking to reflect the population distribution but computations are conditioned on the sample values). Suppose you have data with two factors but unfortunately an unbalanced design. You want to test the two main effects and the interaction. The model is $Y_{ijk}=\mu +\gamma _{j} +\theta _{k} +(\gamma\theta )_{jk} +E_{ijk}$ with $k=1, \ldots, n_{ij}$ Your favorite software propose you several ANOVA tables called Type I, II, III, etc. Which one to choose ? Let's concentrate on Type I, where terms are added sequentially, and Type III, where terms are tested marginally (to the full model). To decide, * One might argue about unique explained variance and use this argument to favor one given Type * One might argue that for testing a main effect, the Type III make no sense since the "null" model contains the interaction but not the main effect. * Searle (1987), Milliken & Johnson (1992) and others however simply argue that as statisticians, we should not look at explain variances or philosophical arguments about what a model should contain, but one should simply look to what null hypothesis each test corresponds. They clearly show that with the Type III SS, the corresponding H0 are exactly what we expect: $\gamma_1=\gamma_2= \cdots = \gamma_a (=0)$, $\theta_1=\theta_2= \cdots = \theta_b (=0)$, and $(\gamma\theta)_{11}=(\gamma\theta)_{12}= \cdots = (\gamma\theta)_{ab} (=0)$ whereas for Type I SS, the corresponding H0 for the first factor is (see Searle p. 112 and 114 for an example): $\rho'_1=\rho'_2= \cdots = \rho'_a (=0)$, where $\rho'_i = \sum_j n_{ij} \mu_{ij} / n_{i.}$ and even more complex for the second factor, where we do not even test that some parameters are 0: $\delta'_j=\sum_i n_{ij} \rho'_j \forall j$, where $\delta'_i = \sum_i n_{ij} \mu_{ij} / n_{.j}$ In 10 years of consulting, I have never seen a psychologist willing to test such an odd hypothesis ! Just looking at the corresponding null hypotheses hopefully will convince some of you that, for unbalanced ANOVA, if "Type III" is recommended in many applied field and used as default by e.g. SAS and SPSS is not so surprising. Finally, a technical detail: in the presence of interaction, the exact definition of Type III for classical method is slightly more involved (from help file of Statistica): "The Type III sums of squares attributable to an effect is computed as the sums of squares for the effect controlling for any effects of equal or lower degree and orthogonal to any higher-order interaction effects (if any) that contain it. The orthogonality to higher-order containing interaction is what gives Type III sums of squares the desirable properties associated with linear combinations of least squares means in ANOVA designs with no missing cells." Also, if programmed correctly, it is "invariant to the choice of the coding of effects for categorical predictor variables (e.g., the use of the sigma-restricted or overparameterized model) and to the choice of the particular g2 inverse of X'X used to solve the normal equations". References Searle, S. R. (1987). Linear models for unbalanced data. New York: Wiley. Milliken, G. A., & Johnson, D. E. (1992). Analysis of messy data: Vol. I. Designed experiments. New York: Chapman & Hall Sorry for the long mail, but it is in the hope that more and more users will turn to robust procedure and to R. Cheers, Olivier -- !!! New e-mail, please update your address book !!! olivier.ren...@unige.ch http://www.unige.ch/fapse/mad/ Methodology & Data Analysis - Psychology Dept - University of Geneva UniMail, Office 4164 - 40, Bd du Pont d'Arve - CH-1211 Geneva 4 [[alternative HTML version deleted]] _______________________________________________ R-SIG-Robust@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust