Dear John - thank you for your detailed answer and help. Your answer encourages me to ask further: by choosing different contrasts, what are the different hypothesis which are being tested? (or put differently - should I prefer contr.sum over contr.poly or contr.helmert, or does this makes no difference ?) How should this question be approached/answered ?
I see in the ?contrasts in R that the referenced reading is: "Chambers, J. M. and Hastie, T. J. (1992) *Statistical models.* Chapter 2 of *Statistical Models in S* eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole." Yet I must admit I don't have this book readily available (not on the web, nor in my local library), so other recommended sources would be of great help. For future reference I add here a some tinkering of the code to show how implementing different contrasts will resort in different SS type III analysis results: phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)), levels=c("pretest", "posttest", "followup")) hour <- ordered(rep(1:5, 3)) idata <- data.frame(phase, hour) contrasted.treatment <- C(OBrienKaiser$treatment, "contr.treatment") mod.ok.contr.treatment <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, post.1, post.2, post.3, post.4, post.5, fup.1, fup.2, fup.3, fup.4, fup.5) ~ contrasted.treatment*gender, data=OBrienKaiser) contrasted.treatment <- C(OBrienKaiser$treatment, "contr.helmert") mod.ok.contr.helmert <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, post.1, post.2, post.3, post.4, post.5, fup.1, fup.2, fup.3, fup.4, fup.5) ~ contrasted.treatment*gender, data=OBrienKaiser) contrasted.treatment <- C(OBrienKaiser$treatment, "contr.poly") mod.ok.contr.poly <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, post.1, post.2, post.3, post.4, post.5, fup.1, fup.2, fup.3, fup.4, fup.5) ~ contrasted.treatment*gender, data=OBrienKaiser) contrasted.treatment <- C(OBrienKaiser$treatment, "contr.sum") mod.ok.contr.sum <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, post.1, post.2, post.3, post.4, post.5, fup.1, fup.2, fup.3, fup.4, fup.5) ~ contrasted.treatment*gender, data=OBrienKaiser) # this is one result: (Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type = "III")) # all of the other contrasts will now give the same outcome: (does that mean there shouldn't be a preference of using one over the other ?) (Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type = "III")) (Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type = "III")) (Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type = "III")) With regards, Tal On Sat, Feb 14, 2009 at 7:09 PM, John Fox <j...@mcmaster.ca> wrote: > Dear Tal, > > > -----Original Message----- > > From: Tal Galili [mailto:tal.gal...@gmail.com] > > Sent: February-14-09 10:23 AM > > To: John Fox > > Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael Friendly > > Subject: Re: [R] Anova and unbalanced designs > > > > Hello John and other R mailing list members. > > > > I've been following your discussions regarding the Anova command for the > SS > > type 2/3 repeated measures Anova, and I have a question: > > > > I found that when I go from using type II to using type III, the summary > > model is suddenly added with an "intercept" term (example in the end of > the > > e-mail). So my question is > > 1) why is this "intercept" term added (in SS type "III" vs the type > "II")? > > The computational approach taken in Anova() makes it simpler to include the > intercept in the "type-III" tests and not to include it in the "type-II" > tests. > > > 2) Can/should this "intercept" term be removed ? (or how should it be > > interpreted ?) > > The test for the intercept is rarely of interest. A "type-II" test for the > intercept would test that the unconditional mean of the response is 0; a > "type-III" test for the intercept would test that the constant term in the > full model fit to the data is 0. The latter depends upon the > parametrization > of the model (in the case of an ANOVA model, what kind of "contrasts" are > used). You state that the example that you give is taken from ?Anova but > there's a crucial detail that's omitted: The help file only gives the > "type-II" tests; the "type-III" tests are also reasonable here, but they > depend upon having used "contr.sum" (or another set of contrasts that's > orthogonal in the row basis of the model matrix) for the between-subject > factors, treatment and gender. This detail is in the data set: > > > OBrienKaiser$gender > [1] M M M F F M M F F M M M F F F F > attr(,"contrasts") > [1] contr.sum > Levels: F M > > > OBrienKaiser$treatment > [1] control control control control control A A A A > B B > [12] B B B B B > attr(,"contrasts") > [,1] [,2] > control -2 0 > A 1 -1 > B 1 1 > Levels: control A B > > With proper contrast coding, the "type-III" test for the intercept tests > that the mean of the cell means (the "grand mean") is 0. > > Had the default dummy-coded contrasts (from contr.treatment) been used, the > tests would not have tested reasonable hypotheses. My advice, from the help > file: "Be very careful in formulating the model for type-III tests, or the > hypotheses tested will not make sense." > > I hope this helps, > John > > > > > My purpose is to be able to use the Anova for analyzing an experiment > with > a > > 2 between and 3 within factors, where the between factors are not > balanced, > > and the within factors are (that is why I can't use the aov command). > > > > > > #---code start > > > > #---code start > > > > #---code start > > > > # (taken from the ?Anova help file) > > > > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)), > > levels=c("pretest", "posttest", "followup")) > > hour <- ordered(rep(1:5, 3)) > > idata <- data.frame(phase, hour) > > idata > > mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, > > post.1, post.2, post.3, post.4, post.5, > > fup.1, fup.2, fup.3, fup.4, fup.5) ~ > treatment*gender, > > data=OBrienKaiser) > > > > # now we have two options > > # option one is to use type II: > > > > (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "II")) > > > > > > #output: > > Type II Repeated Measures MANOVA Tests: Pillai test statistic > > Df test stat approx F num Df den Df Pr(>F) > > treatment 2 0.4809 4.6323 2 10 0.0376868 > * > > gender 1 0.2036 2.5558 1 10 0.1409735 > > treatment:gender 2 0.3635 2.8555 2 10 0.1044692 > > phase 1 0.8505 25.6053 2 9 0.0001930 > *** > > treatment:phase 2 0.6852 2.6056 4 20 0.0667354 > . > > gender:phase 1 0.0431 0.2029 2 9 0.8199968 > > treatment:gender:phase 2 0.3106 0.9193 4 20 0.4721498 > > hour 1 0.9347 25.0401 4 7 0.0003043 > *** > > treatment:hour 2 0.3014 0.3549 8 16 0.9295212 > > gender:hour 1 0.2927 0.7243 4 7 0.6023742 > > treatment:gender:hour 2 0.5702 0.7976 8 16 0.6131884 > > phase:hour 1 0.5496 0.4576 8 3 0.8324517 > > treatment:phase:hour 2 0.6637 0.2483 16 8 0.9914415 > > gender:phase:hour 1 0.6950 0.8547 8 3 0.6202076 > > treatment:gender:phase:hour 2 0.7928 0.3283 16 8 0.9723693 > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > # option two is to use type III, and then get an added intercept term: > > (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "III")) > > > > > > # here is the output: > > Type III Repeated Measures MANOVA Tests: Pillai test statistic > > Df test stat approx F num Df den Df Pr(>F) > > (Intercept) 1 0.967 296.389 1 10 9.241e-09 > *** > > treatment 2 0.441 3.940 2 10 0.0547069 > . > > gender 1 0.268 3.659 1 10 0.0848003 > . > > treatment:gender 2 0.364 2.855 2 10 0.1044692 > > phase 1 0.814 19.645 2 9 0.0005208 > *** > > treatment:phase 2 0.696 2.670 4 20 0.0621085 > . > > gender:phase 1 0.066 0.319 2 9 0.7349696 > > treatment:gender:phase 2 0.311 0.919 4 20 0.4721498 > > hour 1 0.933 24.315 4 7 0.0003345 > *** > > treatment:hour 2 0.316 0.376 8 16 0.9183275 > > gender:hour 1 0.339 0.898 4 7 0.5129764 > > treatment:gender:hour 2 0.570 0.798 8 16 0.6131884 > > phase:hour 1 0.560 0.478 8 3 0.8202673 > > treatment:phase:hour 2 0.662 0.248 16 8 0.9915531 > > gender:phase:hour 1 0.712 0.925 8 3 0.5894907 > > treatment:gender:phase:hour 2 0.793 0.328 16 8 0.9723693 > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > > > #---code end > > > > #---code end > > > > #---code end > > > > > > > > Thanks in advance for your help! > > Tal Galili > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jan 25, 2009 at 3:08 AM, John Fox <j...@mcmaster.ca> wrote: > > > > > > Dear Peter and Nils, > > > > In my initial message, I stated misleadingly that the contrast > coding > > didn't > > matter for the "type-III" tests here since there is just one > > between-subjects factor, but that's not right: The between type-III > SS > > is > > correct using contr.treatment(), but the within SS is not. As is > > generally > > the case, to get reasonable type-III tests (i.e., tests of > reasonable > > hypotheses), it's necessary to have contrasts that are orthogonal > in > > the > > row-basis of the design, such as contr.sum(), contr.helmert(), or > > contr.poly(). The "type-II" tests, however, are insensitive to the > > contrast > > parametrization. Anova() always uses an orthogonal parametrization > for > > the > > within-subjects design. > > > > The general advice in ?Anova is, "Be very careful in formulating > the > > model > > for type-III tests, or the hypotheses tested will not make sense." > > > > Thanks, Peter, for pointing this out. > > > > > > John > > > > ------------------------------ > > John Fox, Professor > > Department of Sociology > > McMaster University > > Hamilton, Ontario, Canada > > web: socserv.mcmaster.ca/jfox > > > > > > > -----Original Message----- > > > > > From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk] > > > Sent: January-24-09 6:31 PM > > > To: Nils Skotara > > > Cc: John Fox; r-help@r-project.org; 'Michael Friendly' > > > Subject: Re: [R] Anova and unbalanced designs > > > > > > > > Nils Skotara wrote: > > > > Dear John, > > > > > > > > thank you again! You replicated the type III result I got in > SPSS! > > When > > I > > > > calculate Anova() type II: > > > > > > > > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity > > > > > > > > SS num Df Error SS den Df F Pr(>F) > > > > between 4.8000 1 9.0000 8 4.2667 0.07273 . > > > > within 0.2000 1 10.6667 8 0.1500 0.70864 > > > > between:within 2.1333 1 10.6667 8 1.6000 0.24150 > > > > --- > > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > > > I see the exact same values as you had written. > > > > However, and now I am really lost, type III (I did not change > > anything > > > else) > > > > leads to the following: > > > > > > > > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity > > > > > > > > SS num Df Error SS den Df F > > Pr(>F) > > > > (Intercept) 72.000 1 9.000 8 64.0000 > > 4.367e-05 > > > *** > > > > between 4.800 1 9.000 8 4.2667 > > 0.07273 . > > > > as.factor(within) 2.000 1 10.667 8 1.5000 > > 0.25551 > > > > between:as.factor(within) 2.133 1 10.667 8 1.6000 > > 0.24150 > > > > --- > > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > > > How is this possible? > > > > > > This looks like a contrast parametrization issue: If we look at > the > > > per-group mean within-differences and their SE, we get > > > > > > > summary(lm(within1-within2~between - 1)) > > > .. > > > Coefficients: > > > Estimate Std. Error t value Pr(>|t|) > > > between1 -1.0000 0.8165 -1.225 0.256 > > > between2 0.3333 0.6667 0.500 0.631 > > > .. > > > > table(between) > > > between > > > 1 2 > > > 4 6 > > > > > > Now, the type II F test is based on weighting the two means as > you > > would > > > after testing for no interaction > > > > > > > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2) > > > [1] 0.1500205 > > > > > > and type III is to weight them as if there had been equal counts > > > > > > > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2) > > > [1] 0.400022 > > > > > > However, the result above corresponds to looking at group1 only > > > > > > > (-1)^2/(0.8165^2) > > > [1] 1.499987 > > > > > > It helps if you choose orhtogonal contrast parametrizations: > > > > > > > options(contrasts=c("contr.sum","contr.helmert")) > > > > betweenanova <- lm(values ~ between)> Anova(betweenanova, > > idata=with, > > > idesign= ~as.factor(within), type = "III" ) > > > > > > Type III Repeated Measures MANOVA Tests: Pillai test statistic > > > Df test stat approx F num Df den Df > > Pr(>F) > > > (Intercept) 1 0.963 209.067 1 8 > 5.121e- > > 07 > > *** > > > between 1 0.348 4.267 1 8 > > 0.07273 . > > > as.factor(within) 1 0.048 0.400 1 8 > > 0.54474 > > > between:as.factor(within) 1 0.167 1.600 1 8 > > 0.24150 > > > --- > > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > > > > > > > > > > > > -- > > > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, > Entr.B > > > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > > > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) > > 35327918 > > > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) > > 35327907 > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > -- > > ---------------------------------------------- > > > > > > My contact information: > > Tal Galili > > Phone number: 972-50-3373767 > > FaceBook: Tal Galili > > My Blogs: > > www.talgalili.com > > www.biostatistics.co.il > > > > > > > > -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.