Re: [R] Nested ANOVA yields surprising results

2015-10-31 Thread peter dalgaard

> On 30 Oct 2015, at 18:46 , Daniel Wagenaar  wrote:
> 
> Dear R users:
> 
> All textbook references that I consult say that in a nested ANOVA (e.g., 
> A/B), the F statistic for factor A should be calculated as
> 
> F_A = MS_A / MS_(B within A).
> 

That would depend on which hypothesis you test in which model. If a reference 
tells you that you "should" do something without specifying the model, then you 
"should" look at a different reference.

In general, having anything other than the residual MS in the denominator 
indicates that you think it represents an additional source of random 
variation. I don't think that is invariably the case in nested designs (and, by 
the way, notice that "nested" is used differently by different books and 
software).

If you don't say otherwise, R assumes that there is only one source of random 
variation the model - a single error term if you like - and that all other 
terms represent systematic variations. In this mode of thinking, an A:B term 
represents an effect of B within A (additive and interaction effects combined), 
and you can test for its presence by comparing MS_A:B to MS_res. In its 
absence, you might choose to reduce the model and next look for an effect of A; 
purists would do this by comparing MS_A to the new MS_res obtained by pooling 
MS_A:B and MS_res, but lazy statisticians/programmers have found it more 
convenient to stick with the original MS_res denominator throughout (to get the 
pooling done, just fit the reduced model). 

If you want A:B to be a random term, then you need to say so, e.g. using

> summary(aov(Y~A+Error(A:B-1)))

Error: A:B
Df Sum Sq Mean Sq F value Pr(>F)
A2 0.4735  0.2367   0.4030.7
Residuals3 1.7635  0.5878   

Error: Within
  Df Sum Sq Mean Sq F value Pr(>F)
Residuals  6  4.993  0.8322   

(the -1 in the Error() term prevents an error message, which as far as I can 
tell is spurious).

Notice that you need aov() for this; lm() doesn't do Error() terms. This _only_ 
works in balanced designs.

-pd

> But when I run this simple example:
> 
> set.seed(1)
> A <- factor(rep(1:3, each=4))
> B <- factor(rep(1:2, 3, each=2))
> Y <- rnorm(12)
> anova(lm(Y ~ A/B))
> 
> I get this result:
> 
>  Analysis of Variance Table
> 
>  Response: Y
>Df Sum Sq Mean Sq F value Pr(>F)
>  A  2 0.4735 0.23675  0.2845 0.7620
>  A:B3 1.7635 0.58783  0.7064 0.5823
>  Residuals  6 4.9931 0.83218
> 
> Evidently, R calculates the F value for A as MS_A / MS_Residuals.
> 
> While it is straightforward enough to calculate what I think is the correct 
> result from the table, I am surprised that R doesn't give me that answer 
> directly. Does anybody know if R's behavior is intentional, and if so, why? 
> Equally importantly, is there a straightforward way to make R give the answer 
> I expect, that is:
> 
> Df Sum Sq Mean Sq F value Pr(>F)
>  A   2 0.4735 0.23675  0.4028 0.6999
> 
> The students in my statistics class would be much happier if they didn't have 
> to type things like
> 
>  a <- anova(...)
>  F <- a$`Sum Sq`[1] / a$`Sum Sq`[2]
>  P <- 1 - pf(F, a$Df[1], a$Df[2])
> 
> (They are not R programmers (yet).) And to be honest, I would find it easier 
> to read those results directly from the table as well.
> 
> Thanks,
> 
> Daniel Wagenaar
> 
> -- 
> Daniel A. Wagenaar, PhD
> Assistant Professor
> Department of Biological Sciences
> McMicken College of Arts and Sciences
> University of Cincinnati
> Cincinnati, OH 45221
> Phone: +1 (513) 556-9757
> Email: daniel.wagen...@uc.edu
> Web: http://www.danielwagenaar.net
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nested ANOVA yields surprising results

2015-10-30 Thread Daniel Wagenaar

Dear R users:

All textbook references that I consult say that in a nested ANOVA (e.g., 
A/B), the F statistic for factor A should be calculated as


F_A = MS_A / MS_(B within A).

But when I run this simple example:

set.seed(1)
A <- factor(rep(1:3, each=4))
B <- factor(rep(1:2, 3, each=2))
Y <- rnorm(12)
anova(lm(Y ~ A/B))

I get this result:

  Analysis of Variance Table

  Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
  A  2 0.4735 0.23675  0.2845 0.7620
  A:B3 1.7635 0.58783  0.7064 0.5823
  Residuals  6 4.9931 0.83218

Evidently, R calculates the F value for A as MS_A / MS_Residuals.

While it is straightforward enough to calculate what I think is the 
correct result from the table, I am surprised that R doesn't give me 
that answer directly. Does anybody know if R's behavior is intentional, 
and if so, why? Equally importantly, is there a straightforward way to 
make R give the answer I expect, that is:


 Df Sum Sq Mean Sq F value Pr(>F)
  A   2 0.4735 0.23675  0.4028 0.6999

The students in my statistics class would be much happier if they didn't 
have to type things like


  a <- anova(...)
  F <- a$`Sum Sq`[1] / a$`Sum Sq`[2]
  P <- 1 - pf(F, a$Df[1], a$Df[2])

(They are not R programmers (yet).) And to be honest, I would find it 
easier to read those results directly from the table as well.


Thanks,

Daniel Wagenaar

--
Daniel A. Wagenaar, PhD
Assistant Professor
Department of Biological Sciences
McMicken College of Arts and Sciences
University of Cincinnati
Cincinnati, OH 45221
Phone: +1 (513) 556-9757
Email: daniel.wagen...@uc.edu
Web: http://www.danielwagenaar.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.