Tian It appears the attachment might not have worked so I'll embed Bill's message at the end.
Peter Alspach > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Peter Alspach > Sent: Thursday, 17 August 2006 8:02 a.m. > To: T Mu; R-Help > Subject: Re: [R] confusing about contrasts concept > > > Tian > > Bill Venables wrote an excellent explanation to the S list > back in 1997. > I saved it as a pdf file and attach it herewith ... > > Peter Alspach > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of T Mu > > Sent: Thursday, 17 August 2006 3:23 a.m. > > To: R-Help > > Subject: [R] confusing about contrasts concept > > > > > Hi all, > > > > > Where can I find a thorough explanation of Contrasts and > > > Contrasts Matrices? > > I read some help but still confused. > > > > > Thank you, > > Tian Date sent: Sat, 29 Mar 1997 17:23:31 +1030 From: Bill Venables <[EMAIL PROTECTED]> To: Wei Qian <[EMAIL PROTECTED]> Copies to: [EMAIL PROTECTED] Subject: Re: test contrasts [long] Wei Qian writes: I am new to Splus, so please don't be offended if my question is too naive. We're all friends here, Wei. It's not that kind of group...mostly. Does anyone know how to test contrasts in Splus? To be specific, suppose I have a one-way layout experiment with 3 treatments, and I want to test the contrasts of treatment 1 vs. treatment 2, treatment 1 vs. treatment 3, etc. In SAS, I could use the following: proc gim; class trt; model y = trt; contrasts"! vs. 2" 1-10; contrasts "2 vs. 3" 10-1; run; One way I can think of is that to construct my design matrix and obtain the contrast sum of squares through a series of matrix operations, but is there any easy way to do it or any built-in function in Splus can do it? The answer is 'yes' but hardly anyone seems to know how to do it. The explanation in the 'white book' for example, seems a little incomplete to me and not quite adequate to settle the case you raise. (The explanation in the yellow book is also inadequate, but I hope come July we will have fixed that.) Since this is one of the most frequent questions people ask me in direct email, too, let me try (again) to sort it out in some detail. A formula such as y ~ f, where f is a factor in principle generates a single classification model in the form *y_{ij} == mu + phi_i + e_{ij} Write the design matrix in the form X = [1 Xf], where, assuming f has p levels, Xf is the n x p dummy variable (ie binary) matrix corresponding to the phi_i's. So in matrix terms the model is written as *y = 1 mu + Xf phi + e (a) If you remove the intercept term, using y ~ f -1, then Xf is the design matrix you get and the coefficients correspond to the class means; (b) If the intercept term is left in, then the design matrix X does not have full column rank, so an adjustment has to be made to Xf to make it so. The redundancy comes about because the columns of Xf add to the the 1-vector, that is Xf l_p = l_n. So let C be any p x (p -1) matrix such that [1 C] is nonsingular. It can easily be seen that Xc = [1 (Xf C)] will have full column rank and that fitting the model using this model matrix is equivalent to the original redundantly specified model. The matrix C is called the *coding matrix* for the factor f. The linear model is actually fitted in the form *y = 1 mu + (Xf C) alpha + e where alpha has (p-1) components, of course. In order to make sense of the alpha's we need to relate them back to the phi's. For any such C there will be a vector, v, such the v'C = 0' (using ' for transpose). (In fact v spans the orthogonal complement to the column space of C). Clearly phi and alpha are related by *C alpha = phi but since v'C = 0', it follows that an identification constraint applies, namely v'phi = 0. By multiplying both sides by (C'C)^{-1} C', it also follows that *alpha =(C'C)^{-1}C'phi which provides an interpretation for the alpha's in terms of the (constrained) phi's. For example take the Helmert contrasts. > contr.helmert(4) [,1] [,2] [,3] 1 -1 -1 -1 2 1 -1 -1 3 0 2 -1 4 0 0 3 The constraint vector is clearly v= (1,1,1,1), since the columns add to zero. In this case the columns are also mutually orthogonal, so the matrix (C'C^{-l} C' (the generalized inverse of C) has a similar form apart from a few row divisors: >fractions(ginverse(contr.helmert(4))) [,1] [,2] [,3] [,4] [1,] -1/2 1/2 0 0 [2,] -1/6 -1/6 1/3 0 [3,] -1/12 -1/12 -1/12 1/4 (ginverse() will be available in S+4.0 and fractions(), now available in the MASS2 library, simply displays numbers in fractional form so that patterns are more obvious). Thus the phi's are identified by requiring that they add to zero, and *alpha_l = (phi_2 - phi_l )/2, *alpha_2 = [phi_3 - (phi_l + phi_2)/2] / 3 &c. When the columns of C are not mutually orthogonal the story is not quite so obvious, though. For example take a C matrix using contr.sdif (available in the MASS2 library) > contr.sdif(4) [2-1] [3-2] [4-3] 1 -0.75 -0.5 -0.25 2 0.25 -0.5 -0.25 3 0.25 0.5 -0.25 4 0.25 0.5 0.75 The column sums are all zero so the identification constraint is still the same, but they are not mutually orthogonal. The coefficients do have an easy interpretation, though: > fractions(ginverse(contr. sdif(4))) [,1] [,2] [,3] [,4] [l,] -1 1 0 0 [2,] 0 -1 1 0 [3,] 0 0 -1 1 Thus alpha_i = phi_(i+l) - phi_i, the "successive differences" of phi's, or equally of class means. Now (at last) returning to your case. In specifying the contrasts you want to have as the coefficients, you are specifying not C, but (C'C^{-l} C' (say M). So what you need to do is work backwards, noting that C is also the ginverse of M: > M <- rbind(c(l,-l,0), c(l, 0, -1)) >M [,1] [,2] [,3] [1,] 1 -1 0 [2,] 1 0 -1 > fractions(ginverse(M)) [,1] [,2] [1,] 1/3 1/3 [2,] -2/3 1/3 [3,] 1/3 -2/3 This is the C matrix you need to specify as the coding matrix. Note that (of course) the columns add to zero so that the implied constraint on the phi's is that they sum to zero. Since ginverse() is not yet published code, here is a quick and dirty version that is adequate for this kind of case (but don't push it...) ginv <- function(M) *if(nrow(M) < ncol(M)) t(ginv(t(M))) *else solve(crossprod(M), t(M)) So all you need to do is as follows: > M <- rbind(c(l, -1, 0), c(l, 0, -1)) > Cmat <- ginv(M) > dimnames(Cmat) <- list(NULL, c('l vs 2', '1 vs 3')) # optional > contrasts(trt) <- Cmat > fm <- aov(y ~ trt, .....) The easy way to see the single degrees of freedom components is: > summary .lm(fm)$coef and the partitioned anova table can be shown using: > summary(fm, split = list(trt = list('l vs 2' = 1, '1 vs 3'= 2))) Finally you can see the complete phi-style representation of the fitted model by > dummy.coef(fm) (Unfortunately dummy.coef does not generate an effective variance matrix for the coefficients, otherwise contrasts could be directly tested after the fitting process. This could be fixed, though.) Note that the standard contrast functions contr.helmert, contr.poly and contr.sum (as well as out contr.sdif) all impose the same identification constraint on the phi's, i.e. they add to 1, but contr.treatment is different as it corresponds to phi_l = 0. So the phi parameters mean different things in this case, but the same principle applies. If the rows of M all add to zero (making them "conventional contrasts") so will the columns of C = M'(MM')^{-1}, (obviously). Bill Venables. Bill Venables, Head, Dept of Statistics, Tel.: +61 8 8303 5418 University of Adelaide, Fax.: +61 8 8303 3696 South AUSTRALIA. 5005. Email: [EMAIL PROTECTED] ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}} ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
