Re: [R] Anova and unbalanced designs

2009-02-15 Thread Tal Galili
Dear John - thank you for your detailed answer and help.
Your answer encourages me to ask further: by choosing different contrasts,
what are the different hypothesis which are being tested? (or put
differently - should I prefer contr.sum over  contr.poly or contr.helmert,
or does this makes no difference ?)
How should this question be approached/answered ?

I see in the ?contrasts in R that the referenced reading is:
Chambers, J. M. and Hastie, T. J. (1992) *Statistical models.* Chapter 2
of *Statistical Models in S* eds J. M. Chambers and T. J. Hastie, Wadsworth
 Brooks/Cole.
Yet I must admit I don't have this book readily available (not on the web,
nor in my local library), so other recommended sources would be of great
help.


For future reference I add here a some tinkering of the code to show how
implementing different contrasts will resort in different SS type III
analysis results:

 phase - factor(rep(c(pretest, posttest, followup), c(5, 5, 5)),
 levels=c(pretest, posttest, followup))
 hour - ordered(rep(1:5, 3))
 idata - data.frame(phase, hour)

 contrasted.treatment - C(OBrienKaiser$treatment, contr.treatment)
 mod.ok.contr.treatment - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
 post.1, post.2, post.3, post.4, post.5,
  fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
 contrasted.treatment - C(OBrienKaiser$treatment, contr.helmert)
 mod.ok.contr.helmert - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
 post.1, post.2, post.3, post.4, post.5,
  fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
 contrasted.treatment - C(OBrienKaiser$treatment, contr.poly)
 mod.ok.contr.poly - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
 post.1, post.2, post.3, post.4, post.5,
  fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)
 contrasted.treatment - C(OBrienKaiser$treatment, contr.sum)
 mod.ok.contr.sum - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
 post.1, post.2, post.3, post.4, post.5,
  fup.1, fup.2, fup.3, fup.4, fup.5) ~
contrasted.treatment*gender, data=OBrienKaiser)

# this is one result:
(Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type =
III))
# all of the other contrasts will now give the same outcome:  (does that
mean there shouldn't be a preference of using one over the other ?)
(Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type =
III))
(Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type = III))
(Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type = III))




With regards,
Tal




On Sat, Feb 14, 2009 at 7:09 PM, John Fox j...@mcmaster.ca wrote:

 Dear Tal,

  -Original Message-
  From: Tal Galili [mailto:tal.gal...@gmail.com]
  Sent: February-14-09 10:23 AM
  To: John Fox
  Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael Friendly
  Subject: Re: [R] Anova and unbalanced designs
 
  Hello John and other R mailing list members.
 
  I've been following your discussions regarding the Anova command for the
 SS
  type 2/3 repeated measures Anova, and I have a question:
 
  I found that when I go from using type II to using type III, the summary
  model is suddenly added with an intercept term (example in the end of
 the
  e-mail). So my question is
  1) why is this intercept term added (in SS type III vs the type
 II)?

 The computational approach taken in Anova() makes it simpler to include the
 intercept in the type-III tests and not to include it in the type-II
 tests.

  2) Can/should this intercept term be removed ? (or how should it be
  interpreted ?)

 The test for the intercept is rarely of interest. A type-II test for the
 intercept would test that the unconditional mean of the response is 0; a
 type-III test for the intercept would test that the constant term in the
 full model fit to the data is 0. The latter depends upon the
 parametrization
 of the model (in the case of an ANOVA model, what kind of contrasts are
 used). You state that the example that you give is taken from ?Anova but
 there's a crucial detail that's omitted: The help file only gives the
 type-II tests; the type-III tests are also reasonable here, but they
 depend upon having used contr.sum (or another set of contrasts that's
 orthogonal in the row basis of the model matrix) for the between-subject
 factors, treatment and gender. This detail is in the data set:

  OBrienKaiser$gender
  [1] M M M F F M M F F M M M F F F F
 attr(,contrasts)
 [1] contr.sum
 Levels: F M

  OBrienKaiser$treatment
  [1] control control control control control A   A   A   A
 B   B
 [12] B   B   B   B   B
 attr(,contrasts)
[,1] [,2]
 control   -20
 A  1   -1
 B  11
 Levels: control A B

 With proper contrast coding, the type

Re: [R] Anova and unbalanced designs

2009-02-15 Thread John Fox
Dear Tal,

A complete explanation of this issue is too long for an email; I do address
it in my Applied Regression Analysis and Generalized Linear Models text. The
question seems to come up so frequently that Georges Monette and I are
writing a paper on it (and related issues) for the upcoming useR conference.

Briefly, any set of contrasts that are orthogonal in the row basis of the
model matrix (essentially, composed from what you see when you use the
contrasts() function in R) will produce the same sums of squares (or, in the
multivariate case, sums of squares and products). This include Hermert
(contr.helmert), sigma-constrained (contr.sum), and orthogonal-polynomial
(contr.poly) contrasts, but not dummy-coded contrasts (contr.treatment).
(Actually, if you look carefully, you'll see that the contrasts defined for
treatment in the OBrienKaiser data are custom orthogonal contrasts similar
to Helmert contrasts.) Consequently, if all you're concerned with is the
ANOVA table, it doesn't matter which of these you use. If, however, you're
interested in the individual contrasts, it does of course matter which you
use, and in particular the orthogonal polynomial contrasts are not sensible
if the levels of the factor aren't ordered.

Regards,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: Tal Galili [mailto:tal.gal...@gmail.com]
 Sent: February-15-09 5:16 AM
 To: John Fox
 Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael Friendly
 Subject: Re: [R] Anova and unbalanced designs
 
 Dear John - thank you for your detailed answer and help.
 
 Your answer encourages me to ask further: by choosing different contrasts,
 what are the different hypothesis which are being tested? (or put
differently
 - should I prefer contr.sum over  contr.poly or contr.helmert, or does
this
 makes no difference ?)
 How should this question be approached/answered ?
 
 I see in the ?contrasts in R that the referenced reading is:
 Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of
 Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth 
 Brooks/Cole.
 Yet I must admit I don't have this book readily available (not on the web,
 nor in my local library), so other recommended sources would be of great
 help.
 
 
 For future reference I add here a some tinkering of the code to show how
 implementing different contrasts will resort in different SS type III
 analysis results:
 
  phase - factor(rep(c(pretest, posttest, followup), c(5, 5, 5)),
  levels=c(pretest, posttest, followup))
  hour - ordered(rep(1:5, 3))
  idata - data.frame(phase, hour)
 
  contrasted.treatment - C(OBrienKaiser$treatment, contr.treatment)
  mod.ok.contr.treatment - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
  post.1, post.2, post.3, post.4, post.5,
   fup.1, fup.2, fup.3, fup.4, fup.5) ~
 contrasted.treatment*gender, data=OBrienKaiser)
  contrasted.treatment - C(OBrienKaiser$treatment, contr.helmert)
  mod.ok.contr.helmert - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
  post.1, post.2, post.3, post.4, post.5,
   fup.1, fup.2, fup.3, fup.4, fup.5) ~
 contrasted.treatment*gender, data=OBrienKaiser)
  contrasted.treatment - C(OBrienKaiser$treatment, contr.poly)
  mod.ok.contr.poly - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
  post.1, post.2, post.3, post.4, post.5,
   fup.1, fup.2, fup.3, fup.4, fup.5) ~
 contrasted.treatment*gender, data=OBrienKaiser)
  contrasted.treatment - C(OBrienKaiser$treatment, contr.sum)
  mod.ok.contr.sum - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
  post.1, post.2, post.3, post.4, post.5,
   fup.1, fup.2, fup.3, fup.4, fup.5) ~
 contrasted.treatment*gender, data=OBrienKaiser)
 
 # this is one result:
 (Anova(mod.ok.contr.treatment, idata=idata, idesign=~phase*hour, type =
 III))
 # all of the other contrasts will now give the same outcome:  (does that
mean
 there shouldn't be a preference of using one over the other ?)
 (Anova(mod.ok.contr.helmert, idata=idata, idesign=~phase*hour, type =
III))
 (Anova(mod.ok.contr.poly, idata=idata, idesign=~phase*hour, type = III))
 (Anova(mod.ok.contr.sum, idata=idata, idesign=~phase*hour, type = III))
 
 
 
 
 With regards,
 Tal
 
 
 
 
 On Sat, Feb 14, 2009 at 7:09 PM, John Fox j...@mcmaster.ca wrote:
 
 
   Dear Tal,
 
 
-Original Message-
From: Tal Galili [mailto:tal.gal...@gmail.com]
Sent: February-14-09 10:23 AM
To: John Fox
 
Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael
 Friendly
Subject: Re: [R] Anova and unbalanced designs
   
 
Hello John and other R mailing list members.
   
I've been following your discussions

Re: [R] Anova and unbalanced designs

2009-02-14 Thread Tal Galili
 tests, however, are insensitive to the contrast
 parametrization. Anova() always uses an orthogonal parametrization for the
 within-subjects design.

 The general advice in ?Anova is, Be very careful in formulating the model
 for type-III tests, or the hypotheses tested will not make sense.

 Thanks, Peter, for pointing this out.

 John

 --
 John Fox, Professor
 Department of Sociology
 McMaster University
 Hamilton, Ontario, Canada
 web: socserv.mcmaster.ca/jfox


  -Original Message-
  From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]
  Sent: January-24-09 6:31 PM
  To: Nils Skotara
  Cc: John Fox; r-help@r-project.org; 'Michael Friendly'
  Subject: Re: [R] Anova and unbalanced designs
 
  Nils Skotara wrote:
   Dear John,
  
   thank you again! You replicated the type III result I got in SPSS! When
 I
   calculate Anova() type II:
  
   Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
  
   SS num Df Error SS den Df  F  Pr(F)
   between 4.8000  1   9.  8 4.2667 0.07273 .
   within  0.2000  1  10.6667  8 0.1500 0.70864
   between:within  2.1333  1  10.6667  8 1.6000 0.24150
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
   I see the exact same values as you had written.
   However, and now I am really lost, type III (I did not change anything
  else)
   leads to the following:
  
   Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
  
 SS num Df Error SS den Df   F
 Pr(F)
   (Intercept)   72.000  19.000  8 64.
 4.367e-05
  ***
   between4.800  19.000  8  4.2667
 0.07273 .
   as.factor(within)  2.000  1   10.667  8  1.5000
 0.25551
   between:as.factor(within)  2.133  1   10.667  8  1.6000
 0.24150
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
   How is this possible?
 
  This looks like a contrast parametrization issue: If we look at the
  per-group mean within-differences and their SE, we get
 
summary(lm(within1-within2~between - 1))
  ..
  Coefficients:
Estimate Std. Error t value Pr(|t|)
  between1  -1. 0.8165  -1.2250.256
  between2   0. 0.6667   0.5000.631
  ..
table(between)
  between
  1 2
  4 6
 
  Now, the type II F test is based on weighting the two means as you would
  after testing for no interaction
 
(4*-1+6*.)^2/(4^2*0.8165^2+6^2*0.6667^2)
  [1] 0.1500205
 
  and type III is to weight them as if there had been equal counts
 
(5*-1+5*.)^2/(5^2*0.8165^2+5^2*0.6667^2)
  [1] 0.400022
 
  However, the result above corresponds to looking at group1 only
 
(-1)^2/(0.8165^2)
  [1] 1.499987
 
  It helps if you choose orhtogonal contrast parametrizations:
 
options(contrasts=c(contr.sum,contr.helmert))
betweenanova - lm(values ~ between) Anova(betweenanova, idata=with,
  idesign= ~as.factor(within), type = III )
 
  Type III Repeated Measures MANOVA Tests: Pillai test statistic
 Df test stat approx F num Df den DfPr(F)
  (Intercept)1 0.963  209.067  1  8 5.121e-07
 ***
  between1 0.3484.267  1  8   0.07273 .
  as.factor(within)  1 0.0480.400  1  8   0.54474
  between:as.factor(within)  1 0.1671.600  1  8   0.24150
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 
 
 
  --
  O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
 c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
  ~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
--


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova and unbalanced designs

2009-02-14 Thread John Fox
Dear Tal,

 -Original Message-
 From: Tal Galili [mailto:tal.gal...@gmail.com]
 Sent: February-14-09 10:23 AM
 To: John Fox
 Cc: Peter Dalgaard; Nils Skotara; r-help@r-project.org; Michael Friendly
 Subject: Re: [R] Anova and unbalanced designs
 
 Hello John and other R mailing list members.
 
 I've been following your discussions regarding the Anova command for the
SS
 type 2/3 repeated measures Anova, and I have a question:
 
 I found that when I go from using type II to using type III, the summary
 model is suddenly added with an intercept term (example in the end of
the
 e-mail). So my question is
 1) why is this intercept term added (in SS type III vs the type II)?

The computational approach taken in Anova() makes it simpler to include the
intercept in the type-III tests and not to include it in the type-II
tests.

 2) Can/should this intercept term be removed ? (or how should it be
 interpreted ?)

The test for the intercept is rarely of interest. A type-II test for the
intercept would test that the unconditional mean of the response is 0; a
type-III test for the intercept would test that the constant term in the
full model fit to the data is 0. The latter depends upon the parametrization
of the model (in the case of an ANOVA model, what kind of contrasts are
used). You state that the example that you give is taken from ?Anova but
there's a crucial detail that's omitted: The help file only gives the
type-II tests; the type-III tests are also reasonable here, but they
depend upon having used contr.sum (or another set of contrasts that's
orthogonal in the row basis of the model matrix) for the between-subject
factors, treatment and gender. This detail is in the data set:

 OBrienKaiser$gender
 [1] M M M F F M M F F M M M F F F F
attr(,contrasts)
[1] contr.sum
Levels: F M

 OBrienKaiser$treatment
 [1] control control control control control A   A   A   A
B   B  
[12] B   B   B   B   B  
attr(,contrasts)
[,1] [,2]
control   -20
A  1   -1
B  11
Levels: control A B

With proper contrast coding, the type-III test for the intercept tests
that the mean of the cell means (the grand mean) is 0.

Had the default dummy-coded contrasts (from contr.treatment) been used, the
tests would not have tested reasonable hypotheses. My advice, from the help
file: Be very careful in formulating the model for type-III tests, or the
hypotheses tested will not make sense.

I hope this helps,
 John

 
 My purpose is to be able to use the Anova for analyzing an experiment with
a
 2 between and 3 within factors, where the between factors are not
balanced,
 and the within factors are (that is why I can't use the aov command).
 
 
 #---code start
 
 #---code start
 
 #---code start
 
 # (taken from the ?Anova help file)
 
 phase - factor(rep(c(pretest, posttest, followup), c(5, 5, 5)),
 levels=c(pretest, posttest, followup))
 hour - ordered(rep(1:5, 3))
 idata - data.frame(phase, hour)
 idata
 mod.ok - lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
  post.1, post.2, post.3, post.4, post.5,
  fup.1, fup.2, fup.3, fup.4, fup.5) ~
treatment*gender,
 data=OBrienKaiser)
 
 # now we have two options
 # option one is to use type II:
 
 (av.ok - Anova(mod.ok, idata=idata, idesign=~phase*hour, type = II))
 
 
 #output:
 Type II Repeated Measures MANOVA Tests: Pillai test statistic
 Df test stat approx F num Df den DfPr(F)
 treatment20.4809   4.6323  2 10 0.0376868
*
 gender   10.2036   2.5558  1 10 0.1409735
 treatment:gender 20.3635   2.8555  2 10 0.1044692
 phase10.8505  25.6053  2  9 0.0001930
***
 treatment:phase  20.6852   2.6056  4 20 0.0667354
.
 gender:phase 10.0431   0.2029  2  9 0.8199968
 treatment:gender:phase   20.3106   0.9193  4 20 0.4721498
 hour 10.9347  25.0401  4  7 0.0003043
***
 treatment:hour   20.3014   0.3549  8 16 0.9295212
 gender:hour  10.2927   0.7243  4  7 0.6023742
 treatment:gender:hour20.5702   0.7976  8 16 0.6131884
 phase:hour   10.5496   0.4576  8  3 0.8324517
 treatment:phase:hour 20.6637   0.2483 16  8 0.9914415
 gender:phase:hour10.6950   0.8547  8  3 0.6202076
 treatment:gender:phase:hour  20.7928   0.3283 16  8 0.9723693
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 
 # option two is to use type III, and then get an added intercept term:
  (av.ok - Anova(mod.ok, idata=idata, idesign=~phase*hour, type = III))
 
 
 # here is the output:
 Type III Repeated Measures MANOVA Tests: Pillai test statistic
 Df

Re: [R] Anova and unbalanced designs

2009-01-24 Thread Skotara

Dear John,

thank you for your answer. You are right, I also would not have expected 
a divergent result.

I have double-checked it again. No, I got type-III tests.
When I use type II, I get the same results in SPSS as in 'Anova' (using 
also type-II tests).
My guess was that the somehow weighted means SPSS shows could be 
responsible for this difference.
Or that using 'Anova' would not be correct for unequal group n's, which 
was not the case I think.

Do you have any further ideas?

Thank you!
Nils

John Fox schrieb:

Dear Nils,

This is a pretty simple design, and I wouldn't have thought that there was
much room for getting different results. More generally, but not here (since
there's only one between-subject factor), one shouldn't use
contr.treatment() with type-III tests, as you did. Is it possible that you
got type-II tests from SPSS:

-- snip --

  

summary(Anova(betweenanova, idata=with, idesign= ~within, type = II ))



Type II Repeated Measures MANOVA Tests:

--
 
Term: between 


 Response transformation matrix:
   (Intercept)
w1   1
w2   1

Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 9.6

Sum of squares and products for error:
(Intercept)
(Intercept)  18

Multivariate Tests: between
 Df test stat approx F num Df den Df   Pr(F)  
Pillai1  0.347826 4.27  1  8 0.072726 .

Wilks 1  0.652174 4.27  1  8 0.072726 .
Hotelling-Lawley  1  0.53 4.27  1  8 0.072726 .
Roy   1  0.53 4.27  1  8 0.072726 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 


--
 
Term: within 


 Response transformation matrix:
   within1
w1   1
w2  -1

Sum of squares and products for the hypothesis:
within1
within1 0.4

Sum of squares and products for error:
 within1
within1 21.3

Multivariate Tests: within
 Df test stat  approx F num Df den Df  Pr(F)
Pillai1 0.0184049 0.150  1  8 0.70864
Wilks 1 0.9815951 0.150  1  8 0.70864
Hotelling-Lawley  1 0.0187500 0.150  1  8 0.70864
Roy   1 0.0187500 0.150  1  8 0.70864

--
 
Term: between:within 


 Response transformation matrix:
   within1
w1   1
w2  -1

Sum of squares and products for the hypothesis:
 within1
within1 4.27

Sum of squares and products for error:
 within1
within1 21.3

Multivariate Tests: between:within
 Df test stat  approx F num Df den Df  Pr(F)
Pillai1 0.167 1.600  1  8 0.24150
Wilks 1 0.833 1.600  1  8 0.24150
Hotelling-Lawley  1 0.200 1.600  1  8 0.24150
Roy   1 0.200 1.600  1  8 0.24150

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

SS num Df Error SS den Df  F  Pr(F)  
between 4.8000  1   9.  8 4.2667 0.07273 .
within  0.2000  1  10.6667  8 0.1500 0.70864  
between:within  2.1333  1  10.6667  8 1.6000 0.24150  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 


-- snip --

I hope this helps,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]


On
  

Behalf Of Skotara
Sent: January-23-09 12:16 PM
To: r-help@r-project.org
Subject: [R] Anova and unbalanced designs

Dear R-list!

My question is related to an Anova including within and between subject
factors and unequal group sizes.
Here is a minimal example of what I did:

library(car)
within1 - c(1,2,3,4,5,6,4,5,3,2); within2 - c(3,4,3,4,3,4,3,4,5,4)
values - data.frame(w1 = within1, w2 = within2)
values - as.matrix(values)
between - factor(c(rep(1,4), rep(2,6)))
betweenanova - lm(values ~ between)
with - expand.grid(within = factor(1:2))
withinanova - Anova(betweenanova, idata=with, idesign=
~as.factor(within), type = III )

I do not know if this is the appropriate method to deal with unbalanced
designs.

I observed, that SPSS calculates everything identically except the main
effect of the within factor, here, the SSQ and F-value are very different
If selecting the option show means, the means for the levels of the
within factor in SPSS are the same as:
mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and
mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))).
In other words, they are calculated as if both groups would have the
same size.

I wonder if this is a good solution and if so, how could I do the same
thing in R?
However, I think if this is treated in SPSS 

Re: [R] Anova and unbalanced designs

2009-01-24 Thread John Fox
 out of some list members).

Maybe you should send a message to the SPSS help list.

Regards,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of Skotara
 Sent: January-24-09 6:30 AM
 To: John Fox
 Cc: r-help@r-project.org
 Subject: Re: [R] Anova and unbalanced designs
 
 Dear John,
 
 thank you for your answer. You are right, I also would not have expected
 a divergent result.
 I have double-checked it again. No, I got type-III tests.
 When I use type II, I get the same results in SPSS as in 'Anova' (using
 also type-II tests).
 My guess was that the somehow weighted means SPSS shows could be
 responsible for this difference.
 Or that using 'Anova' would not be correct for unequal group n's, which
 was not the case I think.
 Do you have any further ideas?
 
 Thank you!
 Nils
 
 John Fox schrieb:
  Dear Nils,
 
  This is a pretty simple design, and I wouldn't have thought that there
was
  much room for getting different results. More generally, but not here
 (since
  there's only one between-subject factor), one shouldn't use
  contr.treatment() with type-III tests, as you did. Is it possible that
 you
  got type-II tests from SPSS:
 
  -- snip --
 
 
  summary(Anova(betweenanova, idata=with, idesign= ~within, type = II
))
 
 
  Type II Repeated Measures MANOVA Tests:
 
  --
 
  Term: between
 
   Response transformation matrix:
 (Intercept)
  w1   1
  w2   1
 
  Sum of squares and products for the hypothesis:
  (Intercept)
  (Intercept) 9.6
 
  Sum of squares and products for error:
  (Intercept)
  (Intercept)  18
 
  Multivariate Tests: between
   Df test stat approx F num Df den Df   Pr(F)
  Pillai1  0.347826 4.27  1  8 0.072726 .
  Wilks 1  0.652174 4.27  1  8 0.072726 .
  Hotelling-Lawley  1  0.53 4.27  1  8 0.072726 .
  Roy   1  0.53 4.27  1  8 0.072726 .
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  --
 
  Term: within
 
   Response transformation matrix:
 within1
  w1   1
  w2  -1
 
  Sum of squares and products for the hypothesis:
  within1
  within1 0.4
 
  Sum of squares and products for error:
   within1
  within1 21.3
 
  Multivariate Tests: within
   Df test stat  approx F num Df den Df  Pr(F)
  Pillai1 0.0184049 0.150  1  8 0.70864
  Wilks 1 0.9815951 0.150  1  8 0.70864
  Hotelling-Lawley  1 0.0187500 0.150  1  8 0.70864
  Roy   1 0.0187500 0.150  1  8 0.70864
 
  --
 
  Term: between:within
 
   Response transformation matrix:
 within1
  w1   1
  w2  -1
 
  Sum of squares and products for the hypothesis:
   within1
  within1 4.27
 
  Sum of squares and products for error:
   within1
  within1 21.3
 
  Multivariate Tests: between:within
   Df test stat  approx F num Df den Df  Pr(F)
  Pillai1 0.167 1.600  1  8 0.24150
  Wilks 1 0.833 1.600  1  8 0.24150
  Hotelling-Lawley  1 0.200 1.600  1  8 0.24150
  Roy   1 0.200 1.600  1  8 0.24150
 
  Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
 
  SS num Df Error SS den Df  F  Pr(F)
  between 4.8000  1   9.  8 4.2667 0.07273 .
  within  0.2000  1  10.6667  8 0.1500 0.70864
  between:within  2.1333  1  10.6667  8 1.6000 0.24150
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  -- snip --
 
  I hope this helps,
   John
 
  --
  John Fox, Professor
  Department of Sociology
  McMaster University
  Hamilton, Ontario, Canada
  web: socserv.mcmaster.ca/jfox
 
 
 
  -Original Message-
  From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
 
  On
 
  Behalf Of Skotara
  Sent: January-23-09 12:16 PM
  To: r-help@r-project.org
  Subject: [R] Anova and unbalanced designs
 
  Dear R-list!
 
  My question is related to an Anova including within and between subject
  factors and unequal group sizes.
  Here is a minimal example of what I did:
 
  library(car)
  within1 - c(1,2,3,4,5,6,4,5,3,2); within2 - c(3,4,3,4,3,4,3,4,5,4)
  values - data.frame(w1 = within1, w2 = within2)
  values - as.matrix(values)
  between - factor(c(rep(1,4), rep(2,6)))
  betweenanova - lm(values ~ between)
  with - expand.grid(within = factor(1:2))
  withinanova - Anova(betweenanova, idata

Re: [R] Anova and unbalanced designs

2009-01-24 Thread Nils Skotara
, these agree with Anova():
 
 --- snip 
 
 Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den DfPr(F)
 (Intercept) 1 0.963  209.067  1  8 5.121e-07 ***
 between 1 0.3484.267  1  8   0.07273 .
 within  1 0.0480.400  1  8   0.54474
 between:within  1 0.1671.600  1  8   0.24150
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 
 Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
 
 SS num Df Error SS den DfFPr(F)
 (Intercept)235.200  19.000  8 209.0667 5.121e-07 ***
 between  4.800  19.000  8   4.2667   0.07273 .
 within   0.533  1   10.667  8   0.4000   0.54474
 between:within   2.133  1   10.667  8   1.6000   0.24150
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 --- snip 
 
 So, unless Anova() and SAS are making the same error, I guess SPSS is doing
 something strange (or perhaps you didn't do what you intended in SPSS). As I
 said before, this problem is so simple, that I find it hard to understand
 where there's room for error, but I wanted to check against SAS to test my
 sanity (a procedure that will likely get a rise out of some list members).
 
 Maybe you should send a message to the SPSS help list.
 
 Regards,
  John
 
 --
 John Fox, Professor
 Department of Sociology
 McMaster University
 Hamilton, Ontario, Canada
 web: socserv.mcmaster.ca/jfox
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
  Behalf Of Skotara
  Sent: January-24-09 6:30 AM
  To: John Fox
  Cc: r-help@r-project.org
  Subject: Re: [R] Anova and unbalanced designs
 
  Dear John,
 
  thank you for your answer. You are right, I also would not have expected
  a divergent result.
  I have double-checked it again. No, I got type-III tests.
  When I use type II, I get the same results in SPSS as in 'Anova' (using
  also type-II tests).
  My guess was that the somehow weighted means SPSS shows could be
  responsible for this difference.
  Or that using 'Anova' would not be correct for unequal group n's, which
  was not the case I think.
  Do you have any further ideas?
 
  Thank you!
  Nils
 
  John Fox schrieb:
   Dear Nils,
  
   This is a pretty simple design, and I wouldn't have thought that there
 was
   much room for getting different results. More generally, but not here
  (since
   there's only one between-subject factor), one shouldn't use
   contr.treatment() with type-III tests, as you did. Is it possible that
  you
   got type-II tests from SPSS:
  
   -- snip --
  
  
   summary(Anova(betweenanova, idata=with, idesign= ~within, type = II
 ))
  
  
   Type II Repeated Measures MANOVA Tests:
  
   --
  
   Term: between
  
Response transformation matrix:
  (Intercept)
   w1   1
   w2   1
  
   Sum of squares and products for the hypothesis:
   (Intercept)
   (Intercept) 9.6
  
   Sum of squares and products for error:
   (Intercept)
   (Intercept)  18
  
   Multivariate Tests: between
Df test stat approx F num Df den Df   Pr(F)
   Pillai1  0.347826 4.27  1  8 0.072726 .
   Wilks 1  0.652174 4.27  1  8 0.072726 .
   Hotelling-Lawley  1  0.53 4.27  1  8 0.072726 .
   Roy   1  0.53 4.27  1  8 0.072726 .
   ---
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
   --
  
   Term: within
  
Response transformation matrix:
  within1
   w1   1
   w2  -1
  
   Sum of squares and products for the hypothesis:
   within1
   within1 0.4
  
   Sum of squares and products for error:
within1
   within1 21.3
  
   Multivariate Tests: within
Df test stat  approx F num Df den Df  Pr(F)
   Pillai1 0.0184049 0.150  1  8 0.70864
   Wilks 1 0.9815951 0.150  1  8 0.70864
   Hotelling-Lawley  1 0.0187500 0.150  1  8 0.70864
   Roy   1 0.0187500 0.150  1  8 0.70864
  
   --
  
   Term: between:within
  
Response transformation matrix:
  within1
   w1   1
   w2  -1
  
   Sum of squares and products for the hypothesis:
within1
   within1 4.27
  
   Sum of squares and products for error:
within1
   within1 21.3
  
   Multivariate Tests: between:within
Df test stat  approx F num Df den Df  Pr(F)
   Pillai1 0.167 1.600  1  8 0.24150
   Wilks 1 0.833 1.600  1

Re: [R] Anova and unbalanced designs

2009-01-24 Thread Peter Dalgaard

Nils Skotara wrote:
Dear John, 


thank you again! You replicated the type III result I got in SPSS! When I
calculate Anova() type II:

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

SS num Df Error SS den Df  F  Pr(F)  
between 4.8000  1   9.  8 4.2667 0.07273 .
within  0.2000  1  10.6667  8 0.1500 0.70864  
between:within  2.1333  1  10.6667  8 1.6000 0.24150  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

I see the exact same values as you had written. 
However, and now I am really lost, type III (I did not change anything else)
leads to the following: 


Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

  SS num Df Error SS den Df   FPr(F)
(Intercept)   72.000  19.000  8 64. 4.367e-05 ***
between4.800  19.000  8  4.2667   0.07273 .  
as.factor(within)  2.000  1   10.667  8  1.5000   0.25551
between:as.factor(within)  2.133  1   10.667  8  1.6000   0.24150
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

How is this possible? 


This looks like a contrast parametrization issue: If we look at the 
per-group mean within-differences and their SE, we get


 summary(lm(within1-within2~between - 1))
..
Coefficients:
 Estimate Std. Error t value Pr(|t|)
between1  -1. 0.8165  -1.2250.256
between2   0. 0.6667   0.5000.631
..
 table(between)
between
1 2
4 6

Now, the type II F test is based on weighting the two means as you would 
after testing for no interaction


 (4*-1+6*.)^2/(4^2*0.8165^2+6^2*0.6667^2)
[1] 0.1500205

and type III is to weight them as if there had been equal counts

 (5*-1+5*.)^2/(5^2*0.8165^2+5^2*0.6667^2)
[1] 0.400022

However, the result above corresponds to looking at group1 only

 (-1)^2/(0.8165^2)
[1] 1.499987

It helps if you choose orhtogonal contrast parametrizations:

 options(contrasts=c(contr.sum,contr.helmert))
 betweenanova - lm(values ~ between) Anova(betweenanova, idata=with, 
idesign= ~as.factor(within), type = III )


Type III Repeated Measures MANOVA Tests: Pillai test statistic
  Df test stat approx F num Df den DfPr(F)
(Intercept)1 0.963  209.067  1  8 5.121e-07 ***
between1 0.3484.267  1  8   0.07273 .
as.factor(within)  1 0.0480.400  1  8   0.54474
between:as.factor(within)  1 0.1671.600  1  8   0.24150
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova and unbalanced designs

2009-01-24 Thread John Fox
Dear Peter and Nils,

In my initial message, I stated misleadingly that the contrast coding didn't
matter for the type-III tests here since there is just one
between-subjects factor, but that's not right: The between type-III SS is
correct using contr.treatment(), but the within SS is not. As is generally
the case, to get reasonable type-III tests (i.e., tests of reasonable
hypotheses), it's necessary to have contrasts that are orthogonal in the
row-basis of the design, such as contr.sum(),  contr.helmert(), or
contr.poly(). The type-II tests, however, are insensitive to the contrast
parametrization. Anova() always uses an orthogonal parametrization for the
within-subjects design.

The general advice in ?Anova is, Be very careful in formulating the model
for type-III tests, or the hypotheses tested will not make sense.

Thanks, Peter, for pointing this out.

John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: Peter Dalgaard [mailto:p.dalga...@biostat.ku.dk]
 Sent: January-24-09 6:31 PM
 To: Nils Skotara
 Cc: John Fox; r-help@r-project.org; 'Michael Friendly'
 Subject: Re: [R] Anova and unbalanced designs
 
 Nils Skotara wrote:
  Dear John,
 
  thank you again! You replicated the type III result I got in SPSS! When
I
  calculate Anova() type II:
 
  Univariate Type II Repeated-Measures ANOVA Assuming Sphericity
 
  SS num Df Error SS den Df  F  Pr(F)
  between 4.8000  1   9.  8 4.2667 0.07273 .
  within  0.2000  1  10.6667  8 0.1500 0.70864
  between:within  2.1333  1  10.6667  8 1.6000 0.24150
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  I see the exact same values as you had written.
  However, and now I am really lost, type III (I did not change anything
 else)
  leads to the following:
 
  Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
 
SS num Df Error SS den Df   F
Pr(F)
  (Intercept)   72.000  19.000  8 64.
4.367e-05
 ***
  between4.800  19.000  8  4.2667
0.07273 .
  as.factor(within)  2.000  1   10.667  8  1.5000
0.25551
  between:as.factor(within)  2.133  1   10.667  8  1.6000
0.24150
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  How is this possible?
 
 This looks like a contrast parametrization issue: If we look at the
 per-group mean within-differences and their SE, we get
 
   summary(lm(within1-within2~between - 1))
 ..
 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 between1  -1. 0.8165  -1.2250.256
 between2   0. 0.6667   0.5000.631
 ..
   table(between)
 between
 1 2
 4 6
 
 Now, the type II F test is based on weighting the two means as you would
 after testing for no interaction
 
   (4*-1+6*.)^2/(4^2*0.8165^2+6^2*0.6667^2)
 [1] 0.1500205
 
 and type III is to weight them as if there had been equal counts
 
   (5*-1+5*.)^2/(5^2*0.8165^2+5^2*0.6667^2)
 [1] 0.400022
 
 However, the result above corresponds to looking at group1 only
 
   (-1)^2/(0.8165^2)
 [1] 1.499987
 
 It helps if you choose orhtogonal contrast parametrizations:
 
   options(contrasts=c(contr.sum,contr.helmert))
   betweenanova - lm(values ~ between) Anova(betweenanova, idata=with,
 idesign= ~as.factor(within), type = III )
 
 Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den DfPr(F)
 (Intercept)1 0.963  209.067  1  8 5.121e-07
***
 between1 0.3484.267  1  8   0.07273 .
 as.factor(within)  1 0.0480.400  1  8   0.54474
 between:as.factor(within)  1 0.1671.600  1  8   0.24150
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 
 
 
 --
 O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
   (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
 ~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova and unbalanced designs

2009-01-23 Thread John Fox
Dear Nils,

This is a pretty simple design, and I wouldn't have thought that there was
much room for getting different results. More generally, but not here (since
there's only one between-subject factor), one shouldn't use
contr.treatment() with type-III tests, as you did. Is it possible that you
got type-II tests from SPSS:

-- snip --

 summary(Anova(betweenanova, idata=with, idesign= ~within, type = II ))

Type II Repeated Measures MANOVA Tests:

--
 
Term: between 

 Response transformation matrix:
   (Intercept)
w1   1
w2   1

Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 9.6

Sum of squares and products for error:
(Intercept)
(Intercept)  18

Multivariate Tests: between
 Df test stat approx F num Df den Df   Pr(F)  
Pillai1  0.347826 4.27  1  8 0.072726 .
Wilks 1  0.652174 4.27  1  8 0.072726 .
Hotelling-Lawley  1  0.53 4.27  1  8 0.072726 .
Roy   1  0.53 4.27  1  8 0.072726 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

--
 
Term: within 

 Response transformation matrix:
   within1
w1   1
w2  -1

Sum of squares and products for the hypothesis:
within1
within1 0.4

Sum of squares and products for error:
 within1
within1 21.3

Multivariate Tests: within
 Df test stat  approx F num Df den Df  Pr(F)
Pillai1 0.0184049 0.150  1  8 0.70864
Wilks 1 0.9815951 0.150  1  8 0.70864
Hotelling-Lawley  1 0.0187500 0.150  1  8 0.70864
Roy   1 0.0187500 0.150  1  8 0.70864

--
 
Term: between:within 

 Response transformation matrix:
   within1
w1   1
w2  -1

Sum of squares and products for the hypothesis:
 within1
within1 4.27

Sum of squares and products for error:
 within1
within1 21.3

Multivariate Tests: between:within
 Df test stat  approx F num Df den Df  Pr(F)
Pillai1 0.167 1.600  1  8 0.24150
Wilks 1 0.833 1.600  1  8 0.24150
Hotelling-Lawley  1 0.200 1.600  1  8 0.24150
Roy   1 0.200 1.600  1  8 0.24150

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

SS num Df Error SS den Df  F  Pr(F)  
between 4.8000  1   9.  8 4.2667 0.07273 .
within  0.2000  1  10.6667  8 0.1500 0.70864  
between:within  2.1333  1  10.6667  8 1.6000 0.24150  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

-- snip --

I hope this helps,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of Skotara
 Sent: January-23-09 12:16 PM
 To: r-help@r-project.org
 Subject: [R] Anova and unbalanced designs
 
 Dear R-list!
 
 My question is related to an Anova including within and between subject
 factors and unequal group sizes.
 Here is a minimal example of what I did:
 
 library(car)
 within1 - c(1,2,3,4,5,6,4,5,3,2); within2 - c(3,4,3,4,3,4,3,4,5,4)
 values - data.frame(w1 = within1, w2 = within2)
 values - as.matrix(values)
 between - factor(c(rep(1,4), rep(2,6)))
 betweenanova - lm(values ~ between)
 with - expand.grid(within = factor(1:2))
 withinanova - Anova(betweenanova, idata=with, idesign=
 ~as.factor(within), type = III )
 
 I do not know if this is the appropriate method to deal with unbalanced
 designs.
 
 I observed, that SPSS calculates everything identically except the main
 effect of the within factor, here, the SSQ and F-value are very different
 If selecting the option show means, the means for the levels of the
 within factor in SPSS are the same as:
 mean(c(mean(values$w1[1:4]),mean(values$w1[5:10]))) and
 mean(c(mean(values$w2[1:4]),mean(values$w2[5:10]))).
 In other words, they are calculated as if both groups would have the
 same size.
 
 I wonder if this is a good solution and if so, how could I do the same
 thing in R?
 However, I think if this is treated in SPSS as if the group sizes are
 identical,
 then why not the interaction, which yields to the same result as using
 Anova()?
 
 Many thanks in advance for your time and help!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org