Re: [R] Anova and unbalanced designs

Tal Galili Sun, 15 Feb 2009 02:18:13 -0800

Dear John - thank you for your detailed answer and help.
Your answer encourages me to ask further: by choosing different contrasts,
what are the different hypothesis which are being tested? (or put
differently - should I prefer contr.sum over  contr.poly or contr.helmert,
or does this makes no difference ?)
How should this question be approached/answered ?


I see in the ?contrasts in R that the referenced reading is:
"Chambers, J. M. and Hastie, T. J. (1992) *Statistical models.* Chapter 2
of *Statistical Models in S* eds J. M. Chambers and T. J. Hastie, Wadsworth
& Brooks/Cole."
Yet I must admit I don't have this book readily available (not on the web,
nor in my local library), so other recommended sources would be of great
help.


For future reference I add here a some tinkering of the code to show how
implementing different contrasts will resort in different SS type III
analysis results:

 phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
     levels=c("pretest", "posttest", "followup"))
 hour <- ordered(rep(1:5, 3))
 idata <- data.frame(phase, hour)

 contrasted.treatment <- C(OBrienKaiser$treatment, "contr.treatment")
 mod.ok.contr.treatment <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
                     post.1, post.2, post.3, post.4, post.5,
                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender,     data=OBrienKaiser)
 contrasted.treatment <- C(OBrienKaiser$treatment, "contr.helmert")
 mod.ok.contr.helmert <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
                     post.1, post.2, post.3, post.4, post.5,
                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender,     data=OBrienKaiser)
 contrasted.treatment <- C(OBrienKaiser$treatment, "contr.poly")
 mod.ok.contr.poly <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
                     post.1, post.2, post.3, post.4, post.5,
                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender,     data=OBrienKaiser)
 contrasted.treatment <- C(OBrienKaiser$treatment, "contr.sum")
 mod.ok.contr.sum <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
                     post.1, post.2, post.3, post.4, post.5,
                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender,     data=OBrienKaiser)

# this is one result:
(Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type =
"III"))
# all of the other contrasts will now give the same outcome:  (does that
mean there shouldn't be a preference of using one over the other ?)
(Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type =
"III"))
(Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type = "III"))
(Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type = "III"))




With regards,
Tal




On Sat, Feb 14, 2009 at 7:09 PM, John Fox <j...@mcmaster.ca> wrote:

> Dear Tal,
>
> > -----Original Message-----
> > From: Tal Galili [mailto:tal.gal...@gmail.com]
> > Sent: February-14-09 10:23 AM
> > To: John Fox
> > Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael Friendly
> > Subject: Re: [R] Anova and unbalanced designs
> >
> > Hello John and other R mailing list members.
> >
> > I've been following your discussions regarding the Anova command for the
> SS
> > type 2/3 repeated measures Anova, and I have a question:
> >
> > I found that when I go from using type II to using type III, the summary
> > model is suddenly added with an "intercept" term (example in the end of
> the
> > e-mail). So my question is
> > 1) why is this "intercept" term added (in SS type "III" vs the type
> "II")?
>
> The computational approach taken in Anova() makes it simpler to include the
> intercept in the "type-III" tests and not to include it in the "type-II"
> tests.
>
> > 2) Can/should this "intercept" term be removed ? (or how should it be
> > interpreted ?)
>
> The test for the intercept is rarely of interest. A "type-II" test for the
> intercept would test that the unconditional mean of the response is 0; a
> "type-III" test for the intercept would test that the constant term in the
> full model fit to the data is 0. The latter depends upon the
> parametrization
> of the model (in the case of an ANOVA model, what kind of "contrasts" are
> used). You state that the example that you give is taken from ?Anova but
> there's a crucial detail that's omitted: The help file only gives the
> "type-II" tests; the "type-III" tests are also reasonable here, but they
> depend upon having used "contr.sum" (or another set of contrasts that's
> orthogonal in the row basis of the model matrix) for the between-subject
> factors, treatment and gender. This detail is in the data set:
>
> > OBrienKaiser$gender
>  [1] M M M F F M M F F M M M F F F F
> attr(,"contrasts")
> [1] contr.sum
> Levels: F M
>
> > OBrienKaiser$treatment
>  [1] control control control control control A       A       A       A
> B       B
> [12] B       B       B       B       B
> attr(,"contrasts")
>        [,1] [,2]
> control   -2    0
> A          1   -1
> B          1    1
> Levels: control A B
>
> With proper contrast coding, the "type-III" test for the intercept tests
> that the mean of the cell means (the "grand mean") is 0.
>
> Had the default dummy-coded contrasts (from contr.treatment) been used, the
> tests would not have tested reasonable hypotheses. My advice, from the help
> file: "Be very careful in formulating the model for type-III tests, or the
> hypotheses tested will not make sense."
>
> I hope this helps,
>  John
>
> >
> > My purpose is to be able to use the Anova for analyzing an experiment
> with
> a
> > 2 between and 3 within factors, where the between factors are not
> balanced,
> > and the within factors are (that is why I can't use the aov command).
> >
> >
> > #---code start
> >
> > #---code start
> >
> > #---code start
> >
> > # (taken from the ?Anova help file)
> >
> > phase <- factor(rep(c("pretest", "posttest", "followup"), c(5, 5, 5)),
> >     levels=c("pretest", "posttest", "followup"))
> > hour <- ordered(rep(1:5, 3))
> > idata <- data.frame(phase, hour)
> > idata
> > mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
> >                      post.1, post.2, post.3, post.4, post.5,
> >                      fup.1, fup.2, fup.3, fup.4, fup.5) ~
> treatment*gender,
> >                 data=OBrienKaiser)
> >
> > # now we have two options
> > # option one is to use type II:
> >
> > (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "II"))
> >
> >
> > #output:
> > Type II Repeated Measures MANOVA Tests: Pillai test statistic
> >                             Df test stat approx F num Df den Df    Pr(>F)
> > treatment                    2    0.4809   4.6323      2     10 0.0376868
> *
> > gender                       1    0.2036   2.5558      1     10 0.1409735
> > treatment:gender             2    0.3635   2.8555      2     10 0.1044692
> > phase                        1    0.8505  25.6053      2      9 0.0001930
> ***
> > treatment:phase              2    0.6852   2.6056      4     20 0.0667354
> .
> > gender:phase                 1    0.0431   0.2029      2      9 0.8199968
> > treatment:gender:phase       2    0.3106   0.9193      4     20 0.4721498
> > hour                         1    0.9347  25.0401      4      7 0.0003043
> ***
> > treatment:hour               2    0.3014   0.3549      8     16 0.9295212
> > gender:hour                  1    0.2927   0.7243      4      7 0.6023742
> > treatment:gender:hour        2    0.5702   0.7976      8     16 0.6131884
> > phase:hour                   1    0.5496   0.4576      8      3 0.8324517
> > treatment:phase:hour         2    0.6637   0.2483     16      8 0.9914415
> > gender:phase:hour            1    0.6950   0.8547      8      3 0.6202076
> > treatment:gender:phase:hour  2    0.7928   0.3283     16      8 0.9723693
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> >
> > # option two is to use type III, and then get an added intercept term:
> >  (av.ok <- Anova(mod.ok, idata=idata, idesign=~phase*hour, type = "III"))
> >
> >
> > # here is the output:
> > Type III Repeated Measures MANOVA Tests: Pillai test statistic
> >                             Df test stat approx F num Df den Df    Pr(>F)
> > (Intercept)                  1     0.967  296.389      1     10 9.241e-09
> ***
> > treatment                    2     0.441    3.940      2     10 0.0547069
> .
> > gender                       1     0.268    3.659      1     10 0.0848003
> .
> > treatment:gender             2     0.364    2.855      2     10 0.1044692
> > phase                        1     0.814   19.645      2      9 0.0005208
> ***
> > treatment:phase              2     0.696    2.670      4     20 0.0621085
> .
> > gender:phase                 1     0.066    0.319      2      9 0.7349696
> > treatment:gender:phase       2     0.311    0.919      4     20 0.4721498
> > hour                         1     0.933   24.315      4      7 0.0003345
> ***
> > treatment:hour               2     0.316    0.376      8     16 0.9183275
> > gender:hour                  1     0.339    0.898      4      7 0.5129764
> > treatment:gender:hour        2     0.570    0.798      8     16 0.6131884
> > phase:hour                   1     0.560    0.478      8      3 0.8202673
> > treatment:phase:hour         2     0.662    0.248     16      8 0.9915531
> > gender:phase:hour            1     0.712    0.925      8      3 0.5894907
> > treatment:gender:phase:hour  2     0.793    0.328     16      8 0.9723693
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> >
> >
> > #---code end
> >
> > #---code end
> >
> > #---code end
> >
> >
> >
> > Thanks in advance for your help!
> > Tal Galili
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Jan 25, 2009 at 3:08 AM, John Fox <j...@mcmaster.ca> wrote:
> >
> >
> >       Dear Peter and Nils,
> >
> >       In my initial message, I stated misleadingly that the contrast
> coding
> > didn't
> >       matter for the "type-III" tests here since there is just one
> >       between-subjects factor, but that's not right: The between type-III
> SS
> > is
> >       correct using contr.treatment(), but the within SS is not. As is
> > generally
> >       the case, to get reasonable type-III tests (i.e., tests of
> reasonable
> >       hypotheses), it's necessary to have contrasts that are orthogonal
> in
> > the
> >       row-basis of the design, such as contr.sum(),  contr.helmert(), or
> >       contr.poly(). The "type-II" tests, however, are insensitive to the
> > contrast
> >       parametrization. Anova() always uses an orthogonal parametrization
> for
> > the
> >       within-subjects design.
> >
> >       The general advice in ?Anova is, "Be very careful in formulating
> the
> > model
> >       for type-III tests, or the hypotheses tested will not make sense."
> >
> >       Thanks, Peter, for pointing this out.
> >
> >
> >       John
> >
> >       ------------------------------
> >       John Fox, Professor
> >       Department of Sociology
> >       McMaster University
> >       Hamilton, Ontario, Canada
> >       web: socserv.mcmaster.ca/jfox
> >
> >
> >       > -----Original Message-----
> >
> >       > From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]
> >       > Sent: January-24-09 6:31 PM
> >       > To: Nils Skotara
> >       > Cc: John Fox; r-help@r-project.org; 'Michael Friendly'
> >       > Subject: Re: [R] Anova and unbalanced designs
> >       >
> >
> >       > Nils Skotara wrote:
> >       > > Dear John,
> >       > >
> >       > > thank you again! You replicated the type III result I got in
> SPSS!
> > When
> >       I
> >       > > calculate Anova() type II:
> >       > >
> >       > > Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
> >       > >
> >       > >                     SS num Df Error SS den Df      F  Pr(>F)
> >       > > between         4.8000      1   9.0000      8 4.2667 0.07273 .
> >       > > within          0.2000      1  10.6667      8 0.1500 0.70864
> >       > > between:within  2.1333      1  10.6667      8 1.6000 0.24150
> >       > > ---
> >       > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >       > >
> >       > > I see the exact same values as you had written.
> >       > > However, and now I am really lost, type III (I did not change
> > anything
> >       > else)
> >       > > leads to the following:
> >       > >
> >       > > Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
> >       > >
> >       > >                               SS num Df Error SS den Df       F
> >       Pr(>F)
> >       > > (Intercept)               72.000      1    9.000      8 64.0000
> >       4.367e-05
> >       > ***
> >       > > between                    4.800      1    9.000      8  4.2667
> >       0.07273 .
> >       > > as.factor(within)          2.000      1   10.667      8  1.5000
> >       0.25551
> >       > > between:as.factor(within)  2.133      1   10.667      8  1.6000
> >       0.24150
> >       > > ---
> >       > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >       > >
> >       > > How is this possible?
> >       >
> >       > This looks like a contrast parametrization issue: If we look at
> the
> >       > per-group mean within-differences and their SE, we get
> >       >
> >       >  > summary(lm(within1-within2~between - 1))
> >       > ..
> >       > Coefficients:
> >       >           Estimate Std. Error t value Pr(>|t|)
> >       > between1  -1.0000     0.8165  -1.225    0.256
> >       > between2   0.3333     0.6667   0.500    0.631
> >       > ..
> >       >  > table(between)
> >       > between
> >       > 1 2
> >       > 4 6
> >       >
> >       > Now, the type II F test is based on weighting the two means as
> you
> > would
> >       > after testing for no interaction
> >       >
> >       >  > (4*-1+6*.3333)^2/(4^2*0.8165^2+6^2*0.6667^2)
> >       > [1] 0.1500205
> >       >
> >       > and type III is to weight them as if there had been equal counts
> >       >
> >       >  > (5*-1+5*.3333)^2/(5^2*0.8165^2+5^2*0.6667^2)
> >       > [1] 0.400022
> >       >
> >       > However, the result above corresponds to looking at group1 only
> >       >
> >       >  > (-1)^2/(0.8165^2)
> >       > [1] 1.499987
> >       >
> >       > It helps if you choose orhtogonal contrast parametrizations:
> >       >
> >       >  > options(contrasts=c("contr.sum","contr.helmert"))
> >       >  > betweenanova <- lm(values ~ between)> Anova(betweenanova,
> > idata=with,
> >       > idesign= ~as.factor(within), type = "III" )
> >       >
> >       > Type III Repeated Measures MANOVA Tests: Pillai test statistic
> >       >                            Df test stat approx F num Df den Df
> > Pr(>F)
> >       > (Intercept)                1     0.963  209.067      1      8
> 5.121e-
> > 07
> >       ***
> >       > between                    1     0.348    4.267      1      8
> > 0.07273 .
> >       > as.factor(within)          1     0.048    0.400      1      8
> > 0.54474
> >       > between:as.factor(within)  1     0.167    1.600      1      8
> > 0.24150
> >       > ---
> >       > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >       >
> >       >
> >       >
> >       >
> >       > --
> >       >     O__  ---- Peter Dalgaard             Øster Farimagsgade 5,
> Entr.B
> >       >    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >       >   (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
> > 35327918
> >       > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45)
> > 35327907
> >
> >       ______________________________________________
> >       R-help@r-project.org mailing list
> >       https://stat.ethz.ch/mailman/listinfo/r-help
> >       PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> >       and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> >
> > --
> > ----------------------------------------------
> >
> >
> > My contact information:
> > Tal Galili
> > Phone number: 972-50-3373767
> > FaceBook: Tal Galili
> > My Blogs:
> > www.talgalili.com
> > www.biostatistics.co.il
> >
> >
>
>
>
>


-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Anova and unbalanced designs

Reply via email to