Re: [R] Nested ANOVA yields surprising results

2015-10-31 Thread peter dalgaard

> On 30 Oct 2015, at 18:46 , Daniel Wagenaar  wrote:
> 
> Dear R users:
> 
> All textbook references that I consult say that in a nested ANOVA (e.g., 
> A/B), the F statistic for factor A should be calculated as
> 
> F_A = MS_A / MS_(B within A).
> 

That would depend on which hypothesis you test in which model. If a reference 
tells you that you "should" do something without specifying the model, then you 
"should" look at a different reference.

In general, having anything other than the residual MS in the denominator 
indicates that you think it represents an additional source of random 
variation. I don't think that is invariably the case in nested designs (and, by 
the way, notice that "nested" is used differently by different books and 
software).

If you don't say otherwise, R assumes that there is only one source of random 
variation the model - a single error term if you like - and that all other 
terms represent systematic variations. In this mode of thinking, an A:B term 
represents an effect of B within A (additive and interaction effects combined), 
and you can test for its presence by comparing MS_A:B to MS_res. In its 
absence, you might choose to reduce the model and next look for an effect of A; 
purists would do this by comparing MS_A to the new MS_res obtained by pooling 
MS_A:B and MS_res, but lazy statisticians/programmers have found it more 
convenient to stick with the original MS_res denominator throughout (to get the 
pooling done, just fit the reduced model). 

If you want A:B to be a random term, then you need to say so, e.g. using

> summary(aov(Y~A+Error(A:B-1)))

Error: A:B
Df Sum Sq Mean Sq F value Pr(>F)
A2 0.4735  0.2367   0.4030.7
Residuals3 1.7635  0.5878   

Error: Within
  Df Sum Sq Mean Sq F value Pr(>F)
Residuals  6  4.993  0.8322   

(the -1 in the Error() term prevents an error message, which as far as I can 
tell is spurious).

Notice that you need aov() for this; lm() doesn't do Error() terms. This _only_ 
works in balanced designs.

-pd

> But when I run this simple example:
> 
> set.seed(1)
> A <- factor(rep(1:3, each=4))
> B <- factor(rep(1:2, 3, each=2))
> Y <- rnorm(12)
> anova(lm(Y ~ A/B))
> 
> I get this result:
> 
>  Analysis of Variance Table
> 
>  Response: Y
>Df Sum Sq Mean Sq F value Pr(>F)
>  A  2 0.4735 0.23675  0.2845 0.7620
>  A:B3 1.7635 0.58783  0.7064 0.5823
>  Residuals  6 4.9931 0.83218
> 
> Evidently, R calculates the F value for A as MS_A / MS_Residuals.
> 
> While it is straightforward enough to calculate what I think is the correct 
> result from the table, I am surprised that R doesn't give me that answer 
> directly. Does anybody know if R's behavior is intentional, and if so, why? 
> Equally importantly, is there a straightforward way to make R give the answer 
> I expect, that is:
> 
> Df Sum Sq Mean Sq F value Pr(>F)
>  A   2 0.4735 0.23675  0.4028 0.6999
> 
> The students in my statistics class would be much happier if they didn't have 
> to type things like
> 
>  a <- anova(...)
>  F <- a$`Sum Sq`[1] / a$`Sum Sq`[2]
>  P <- 1 - pf(F, a$Df[1], a$Df[2])
> 
> (They are not R programmers (yet).) And to be honest, I would find it easier 
> to read those results directly from the table as well.
> 
> Thanks,
> 
> Daniel Wagenaar
> 
> -- 
> Daniel A. Wagenaar, PhD
> Assistant Professor
> Department of Biological Sciences
> McMicken College of Arts and Sciences
> University of Cincinnati
> Cincinnati, OH 45221
> Phone: +1 (513) 556-9757
> Email: daniel.wagen...@uc.edu
> Web: http://www.danielwagenaar.net
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested anova-R chrashing

2011-09-12 Thread Ben Bolker
joerg stephan stephanj at rhrk.uni-kl.de writes:

 
 Hi,
 
 I tried to do a nested Anova with the attached Data. My response 
 variable is survivors and I would like to know the effect of 
 (insect-egg clutch) size, position (of clutch on twig) and clone 
 (/plant genotype) on the survival of eggs (due to predation). Each plant 
 was provided with three different sizes of clutches (45,15,5) and had 
 pseudo-replications of size 15 and 5 on it.

This may not be what you wanted at all, but here is my take on
what you could try with these data.

You have binomial data with lots of zeros and ones (i.e. 0 or 100%
survival), so they are unlikely to be transformable to normality
however you try.  You *might* be able to get away with modeling
these as normally distributed and then rely on permutation tests
to get your significance levels right, but (tricky as it is) I
actually think GLMMs (Zuur et al 2009, Bolker et al 2009) are
the best way to go ...

## read in the data
x - read.table(aovmisc.dat,header=TRUE)

## take a look
summary(x)
## with(x,table(clone,as.factor(size),as.factor(position)))

## make categorical versions of size  plant variables,
##  and reconstitute the total number surviving variable
x - transform(x,
   fsize=factor(size),
   fplant=factor(plant),
   tsurv=round(survivors*size)
   )

library(ggplot2)

## trying to plot everything ...
zspace - opts(panel.margin=unit(0,lines))  ## squash panels together
theme_update(theme_bw())  ## white background

## plot all data: survival vs position, with all plants represented;
##  separate subplots for each clone
## overlay GLM fit
ggplot(x,aes(x=position,y=survivors,size=size,colour=fplant))+
   geom_point(alpha=0.5)+
  facet_wrap(~clone)+xlim(0,15)+geom_line(size=0.5,alpha=0.2)+zspace+
  geom_smooth(aes(group=1,weight=size),
  colour=black,method=glm,family=binomial)

## it looks like there is a positive effect of position, and also
##  a positive effect of size (large bubbles toward high-survival)

## there are a few outliers in 'position' -- are these typos?
subset(x,position15)


## focus on size and clone instead, suppress position:

ggplot(x,aes(x=fsize,y=survivors,colour=clone))+stat_sum(aes(size=..n..))+
  facet_grid(.~clone)+
  geom_smooth(method=glm,family=binomial,aes(weight=size,
 x=as.numeric(fsize)))
## OR
## +  stat_summary(fun.y=mean,geom=line,aes(x=as.numeric(fsize)))+zspace

library(mgcv)
ggplot(x,aes(x=position,y=survivors,colour=clone))+stat_sum(aes(size=..n..))+
  facet_grid(fsize~clone)+
  geom_smooth(method=gam,family=binomial)+xlim(0,15)+zspace

## appears to be an effect of size, and a clone*size interaction.
## possible bimodal distribution (0/1) at size=5?

with(x,table(clone,plant)) ## each plant is only in one clone 
## (explicit nesting)
library(lme4)

## fit with size*clone interaction, main effect of position
## no need to nest plant in clone because they are explicitly nested
##  (i.e. unique plant IDs)
g1 - glmer(cbind(tsurv,size)~fsize*clone+position+(1|fplant),
data=x,
family=binomial)

## doesn't LOOK like overdispersion ...
sum(residuals(g1)^2)  ##  residual df
nrow(x)-length(fixef(g1))
pchisq(778,df=537)

## ... but try fitting observation-level random effect anyway
x - transform(x,obs=seq(nrow(x)))
g2 - update(g1,.~.+(1|obs))
summary(g2)
## among-observation variance 2x among-plant variance
anova(g1,g2)  ## looks much better (10 AIC units lower / p  0.05

## 
drop1(g2,test=Chisq)
## looks like we can drop the size*clone interaction?

g3 - update(g2,.~.-clone:fsize)
drop1(g3,test=Chisq)
## strong effects of size, position;
## weak effects of clone
fixef(g3)



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested anova

2010-10-22 Thread Joshua Wiley
Hi Rick,

Whenever I hear my instant association with post hoc and ANOVA would
be ?TukeyHSD  However, if you are not comfortable interpreting the
model you ran, this suggests that you may benefit more from learning
more statistical theory or finding someone to consult with who can
help.  You might also try graphing your data.  Something like this
would be one (of very many and not necessarily the best for your data)
option:

library(lattice)
xyplot(catchrate ~ habitat | site, data = pat)
xyplot(catchrate ~ site | Gear, data = pat)
xyplot(catchrate ~ habitat | Gear, data = pat)

I tend to prefer using regression and then either accepting the
default contrasts or (if I have some theory/hypothesis) specifying my
own contrast matrices and using those.  However, it is not really
feasible for members of this list to help you interpret your results
or suggest appropriate statistical techniques without knowing your
actual data.  If you have specific questions related to R (e.g., How
can I calculate some test or statistic?  Here is a representative,
but small sample dataset.), you will get a lot more advice and help.

Best regards,

Josh


On Thu, Oct 21, 2010 at 4:13 PM, mirick miri...@yahoo.com wrote:

 Hello all,
 Can any of you R gurus help me out?  I’m not all that great at stats to
 begin with, and I’m also learning the R ropes (former SAS user).
 Here’s what I need help with…  I have a nested sample design and ran a
 nested anova, but I don’t know how to interpret the results
 habitat (four different types) is nested in site (three types), and site is
 nested in gear (two types)
  My  code:    pat2-aov(catchrate~habitat/site/gear, data=pat)
 This created the following outcome:
                               Df               Sum Sq           Mean Sq
 F value                      Pr(F)
 habitat                   3                2.932              0.9774
 0.9543               0.4155656
 habitat:site           8                18.716            2.3395
 2.2842                0.0235207
 habitat:site:Gear  12             39.244            3.2704
 3.1930                0.0003546
 Residuals              186 190.505  1.0242

  What exactly does this outcome mean?   It looks like there are differences
 between gears and sites, but not among habitats.  Which gear is better,
 which site is better, which gear works better in each site, etc.?
 I’ve looked for some post hoc code to do this, but I can’t find anything and
 I am at wits end.
 Thanks,
 Rick M.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/nested-anova-tp3006469p3006469.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested anova

2010-10-22 Thread Dennis Murphy
Hi:

On Thu, Oct 21, 2010 at 4:13 PM, mirick miri...@yahoo.com wrote:


 Hello all,
 Can any of you R gurus help me out?  I’m not all that great at stats to
 begin with, and I’m also learning the R ropes (former SAS user).


Sounds like you need a support group :)


 Here’s what I need help with…  I have a nested sample design and ran a
 nested anova, but I don’t know how to interpret the results
 habitat (four different types) is nested in site (three types), and site is
 nested in gear (two types)
  My  code:pat2-aov(catchrate~habitat/site/gear, data=pat)
 This created the following outcome:
   Df   Sum Sq   Mean Sq
 F value  Pr(F)
 habitat   32.932  0.9774
 0.9543   0.4155656
 habitat:site   818.7162.3395
 2.28420.0235207
 habitat:site:Gear  12 39.2443.2704
 3.19300.0003546
 Residuals  186 190.505  1.0242

  What exactly does this outcome mean?   It looks like there are differences
 between gears and sites, but not among habitats.  Which gear is better,
 which site is better, which gear works better in each site, etc.?
 I’ve looked for some post hoc code to do this, but I can’t find anything
 and
 I am at wits end.
 Thanks,
 Rick M.


Firstly, in a nested design, one often treats nested factors as random. You
appear to want to treat them as fixed, which means that you are only
interested in comparisons among the 24 habitats in your study, which are
nested within the six sites which in turn are nested within the two types of
gear.  Is that correct?

Secondly, the degrees of freedom allocation should clue you in that
something is amiss, which would be your model specification. In R, nesting
works top-down, so your model should be
aov(catchrate ~ gear/site/habitat, data = pat)

BTW, 186 df for residual? Do you have nine observations in each habitat? Are
the data balanced within habitat (i.e., no missing data)?  Your ANOVA should
look like

Gears  1
Gears/Sites 4
Gears/Sites/Habitat18

if your description is correct. According to your ANOVA, you have 210
observations. If you had nine observations per habitat, there would be 216
observations total, so is it reasonable to conclude you either have missing
responses or unbalanced data within habitat? If so, how severe is the
imbalance?

Thirdly, as far as multiple comparisons go, you need to be very careful to
use the correct variance estimate at each level. However, it seems to me
that comparisons would only make sense within level (assuming they make
sense at all); e.g., comparing the four habitats in a particular site, and
repeating the comparison for each site.

I'd consider investing some time deriving the means and variances of each
set of level means in the hierarchy. That should help sort out some of your
questions. For example, there is only one comparison of gears, and the two
gear means are obtained by averaging the 105 or so observations they each
contain. This averages out site and habitat effects, so asking which gear
works better in each site is an impossible question to answer since each
site is associated with exactly one gear (by the definition of nested
factors). You can compare the three sites within a particular gear, but you
can't compare sites across both gears, because each site is associated with
only one gear. Similarly, each habitat is associated with exactly one site,
and hence one gear. This is what I mean when suggesting that you sit down
and work out the algebra, to understand which effects can be compared and
which can't, along with an understanding of how the variances of the means
at each level of the hierarchy are derived and how they differ.

HTH,
Dennis

 --
 View this message in context:
 http://r.789695.n4.nabble.com/nested-anova-tp3006469p3006469.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-06 Thread John Fox
Dear Anita and Joris,

Please see https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html,
posted to r-help in March.

Regards,
 John


John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of Anita Narwani
 Sent: June-03-10 8:49 PM
 To: Joris Meys
 Cc: r-help@r-project.org
 Subject: Re: [R] Nested ANOVA with covariate using Type III sums of
squares
 
 Yes I understood the strangeness of removing a main effect without
 interactions that contain it because I did this during my efforts using
 model simplification. I had checked out the link you sent a couple of days
 ago. It was useful. So does Type II SS remove both the factor and any
 interactions containing it when comparing models? i.e. to test for the
main
 effect of B you compare A + B + A:B against A?
 
 On Thu, Jun 3, 2010 at 4:59 PM, Joris Meys jorism...@gmail.com wrote:
 
  Hi Anita,
 
  I have to correct myself too, I've been rambling a bit. Off course you
  don't delete the variable out of the interaction term when you test the
 main
  effect. What I said earlier didn't really make any sense.
 
  That testing a main effect without removing the interaction term is has
a
  tricky interpretation. By removing a main effect you test full model  A
+ B
  + A:B against the model A + A:B.  If you remove the main effect Zoop
for
  example, you basically nest Zoop within Diversity and test whether
that's
  not worse than the full model. This explains it very well:
 
  https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html
 
  I'd go for type II, but you're free to test any hypothesis you want.
 
  Cheers
  Joris
 
 
 
  On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani
 anitanarw...@gmail.comwrote:
 
  Thanks for your response Joris.
 
  I was aware of the potential for aliasing, although I thought that this
  was only a problem when you have missing cell means. It was interesting
to
  read the vehement argument regarding the Type III sums of squares, and
  although I knew that there were different positions on the topic, I had
no
  idea how divisive it was. Nevertheless, Type III SS are generally
  recommended by statistical texts in ecology for my type of experimental
  design. Interestingly, despite the aliasing, SPSS has no problems
  calculating Type III SS for this data set. This is simply because I am
  entering a co-variate, which causes non-orthogonality. I would be
happier
  using R and the default Type I SS, which are the same as the Type III
SS
  anyway when I omit the co-variate of Mean.richness, except that these
  results are very sensitive to the order in which I add the variables
into
  the model when I do enter the co-variate. I understand that the order
is
  very important relates back to the scientific hypothesis, but I am
equally
  interested in the main effects of Zoop, Diversity, and the nested
effect
 of
  Phyto, so entering either of these variables before the other does not
 make
  sense from an ecological perspective, and because the results do
change,
 the
  order cannot be ignored from a statistical perspective.
  Finally, I have tried using the Type II SS and received similar
warnings.
 
  Do you have a recommendations?
  Anita.
 
 
 
 
  --
  Joris Meys
  Statistical Consultant
 
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  Coupure Links 653
  B-9000 Gent
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Anita Narwani
I would just like to add that when I remove the co-variate of Mean.richness
from the model (i.e. eliminating the non-orthogonality), the aliasing
warning is replaced by the following error message:
Error in t(Z) %*% ip : non-conformable arguments

That is when I enter this model:
carbonmean-lm(C.Mean~ Diversity + Zoop + Diversity/Phyto +
Zoop*Diversity/Phyto)






On Wed, Jun 2, 2010 at 6:05 PM, Joris Meys jorism...@gmail.com wrote:

 that's diversity/phyto, zoop or phyto twice in the formula.


 On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote:

 That's what one would expect with type III sum of squares. You have Phyto
 twice in your model, but only as a nested factor. To compare the full model
 with a model without diversity of zoop, you have either the combination
 diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

 Depending on how you stand on type III sum of squares, you could call that
 a bug. Personally, I'd just not use them.

 https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

 Cheers
 Joris


 On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop +
 Diversity/Phyto +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type
 III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
 Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in
 the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,:
 One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following
 output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0  -1.49e-0874110911
 718

 Diversity:Zoop:Phyto 6  61710145135821055
 735



 There are zero degrees of freedom for Diversity, Zoop and their
 interaction,
 and zero sums of sq for Diversity and Zoop. This cannot be correct,
 however
 when I do the model simplification by dropping terms from the models
 manually and comparing them using anova(), I get virtually the same
 results.



 I would appreciate any suggestions for things to try or pointers as to
 what
 I may be doing incorrectly.



 Thank you.

 Anita Narwani.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent 

Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
Could you copy the data?

Data - data.frame(C.Mean,Mean.richness,Zoop,Diversity,Phyto)
dput(Data)

I have the feeling something's wrong there. I believe you have 48
observations (47df + 1 for the intercept), 2 levels of Diversity, 4 of Phyto
and 48/(3*4)=4 levels of Zoop. But you don't have 3df for Zoop. Either I'm
way off, or what goes in the lm is not what you think it is.

I tried a small sample with the datastructure I believe you have, but I
couldn't reproduce your error.

## Run
Phyto - as.factor(rep(rep(c(A,B,C,D),each=6),2))
Diversity - as.factor(rep(c(High,Low),each=24))
Zoop - rep(c(1,2,3,4),times=12)

C.Mean - rnorm(48)
Mean.richness -rnorm(48)

test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto +
Zoop*Diversity/Phyto)

Anova(test,type=III)

Zoop - as.factor(Zoop)
Anova(test,type=III)
## End Run

Cheers
Joris

On Thu, Jun 3, 2010 at 10:26 PM, Anita Narwani anitanarw...@gmail.comwrote:

 I would just like to add that when I remove the co-variate of Mean.richness
 from the model (i.e. eliminating the non-orthogonality), the aliasing
 warning is replaced by the following error message:
 Error in t(Z) %*% ip : non-conformable arguments

 That is when I enter this model:
 carbonmean-lm(C.Mean~ Diversity + Zoop + Diversity/Phyto +
 Zoop*Diversity/Phyto)







 On Wed, Jun 2, 2010 at 6:05 PM, Joris Meys jorism...@gmail.com wrote:

 that's diversity/phyto, zoop or phyto twice in the formula.


 On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote:

 That's what one would expect with type III sum of squares. You have Phyto
 twice in your model, but only as a nested factor. To compare the full model
 with a model without diversity of zoop, you have either the combination
 diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

 Depending on how you stand on type III sum of squares, you could call
 that a bug. Personally, I'd just not use them.

 https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

 Cheers
 Joris


 On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop +
 Diversity/Phyto +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type
 III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
   Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in
 the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,:
 One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following
 output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403
  123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0  

Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
I see where my confusion comes from. I counted 4 levels of Phyto, but
you have 8, being 4 in every level of Diversity. There's your
aliasing.

 table(Diversity,Phyto)
 Phyto
Diversity M1 M2 M3 M4 P1 P2 P3 P4
    H  0  0  0  0  6  6  6  6
    L  6  6  6  6  0  0  0  0

There's no need to code them differently for every level of Diversity.
If you don't, all is fine :

 Phyto - gsub(M,P,as.character(Phyto))
 Phyto - as.factor(Phyto)

 test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
+ Zoop*Diversity/Phyto)

 Anova(test,type=III)
Anova Table (Type III tests)

Response: C.Mean
    Sum Sq Df F value    Pr(F)
(Intercept)   23935609  1 10.0121 0.0034729 **
Mean.richness 49790385  1 20.8269 7.471e-05 ***
Diversity 35807205  1 14.9779 0.0005234 ***
Zoop  10794614  1  4.5153 0.0416688 *
Diversity:Phyto  118553464  6  8.2650 2.184e-05 ***
Diversity:Zoop  261789  1  0.1095 0.7429356
Diversity:Zoop:Phyto  61710162  6  4.3021 0.0028790 **
Residuals 74110938 31
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


You can check with summary(test) that the model is fitted correctly.

On Fri, Jun 4, 2010 at 12:48 AM, Anita Narwani anitanarw...@gmail.com wrote:

 You have everything right except that there are only 2 zooplankton species (C 
  D, which stand for Ceriodaphnia and Daphnia).


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
Hi Anita,

I have to correct myself too, I've been rambling a bit. Off course you don't
delete the variable out of the interaction term when you test the main
effect. What I said earlier didn't really make any sense.

That testing a main effect without removing the interaction term is has a
tricky interpretation. By removing a main effect you test full model  A + B
+ A:B against the model A + A:B.  If you remove the main effect Zoop for
example, you basically nest Zoop within Diversity and test whether that's
not worse than the full model. This explains it very well:

https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html

I'd go for type II, but you're free to test any hypothesis you want.

Cheers
Joris


On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote:

 Thanks for your response Joris.

 I was aware of the potential for aliasing, although I thought that this was
 only a problem when you have missing cell means. It was interesting to read
 the vehement argument regarding the Type III sums of squares, and although I
 knew that there were different positions on the topic, I had no idea how
 divisive it was. Nevertheless, Type III SS are generally recommended by
 statistical texts in ecology for my type of experimental design.
 Interestingly, despite the aliasing, SPSS has no problems calculating Type
 III SS for this data set. This is simply because I am entering a co-variate,
 which causes non-orthogonality. I would be happier using R and the default
 Type I SS, which are the same as the Type III SS anyway when I omit the
 co-variate of Mean.richness, except that these results are very sensitive to
 the order in which I add the variables into the model when I do enter the
 co-variate. I understand that the order is very important relates back to
 the scientific hypothesis, but I am equally interested in the main effects
 of Zoop, Diversity, and the nested effect of Phyto, so entering either of
 these variables before the other does not make sense from an ecological
 perspective, and because the results do change, the order cannot be ignored
 from a statistical perspective.
 Finally, I have tried using the Type II SS and received similar warnings.

 Do you have a recommendations?
 Anita.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Anita Narwani
Hi Joris,
That seems to have worked and the contrasts look correct.
I have tried comparing the results to what SPSS produces for the same model.
The two programs produce very different results, although the model F
statistics, R squared and adjusted R squared values are identical. The
results are so different that I don't know what to trust.

For the same model you coded I got:
test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
+ Zoop*Diversity/Phyto)
 Anova(test,type=III)
Anova Table (Type III tests)

Response: C.Mean
   Sum Sq Df F valuePr(F)
(Intercept)  28223311  1 11.8056  0.001701 **
Mean.richness49790403  1 20.8269 7.471e-05 ***
Diversity31055477  1 12.9903  0.001082 **
Zoop  2736238  1  1.1445  0.292953
Diversity:Phyto  27943313  6  1.9481  0.104103
Diversity:Zoop 168184  1  0.0703  0.792584
Diversity:Zoop:Phyto 61710145  6  4.3021  0.002879 **
Residuals74110911 31
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Also sightly different from your result)

and

 summary(test)

Call:
lm(formula = C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto +
+Zoop * Diversity/Phyto)

Residuals:
 Min   1Q   Median   3Q  Max
-3555.26  -479.5349.94   423.49  4073.20

Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)   -8562.9 2492.2  -3.436  0.00170 **
Mean.richness  4605.7 1009.2   4.564 7.47e-05 ***
DiversityL 6576.9 1824.8   3.604  0.00108 **
ZoopD -1414.4 1322.1  -1.070  0.29295
DiversityH:PhytoP2-4307.5 1824.8  -2.361  0.02472 *
DiversityL:PhytoP2 -268.4 1262.5  -0.213  0.83300
DiversityH:PhytoP3-2233.4 1393.0  -1.603  0.11900
DiversityL:PhytoP3-1571.4 1262.5  -1.245  0.22257
DiversityH:PhytoP4-7914.8 2647.2  -2.990  0.00543 **
DiversityL:PhytoP4-1612.8 1262.5  -1.277  0.21092
DiversityL:ZoopD484.9 1828.0   0.265  0.79258
DiversityH:ZoopD:PhytoP2683.9 1855.3   0.369  0.71493
DiversityL:ZoopD:PhytoP2   6346.4 1785.4   3.555  0.00124 **
DiversityH:ZoopD:PhytoP3   4922.8 1786.3   2.756  0.00971 **
DiversityL:ZoopD:PhytoP3   1085.4 1785.4   0.608  0.54766
DiversityH:ZoopD:PhytoP4   3261.8 1985.6   1.643  0.11055
DiversityL:ZoopD:PhytoP4681.9 1785.4   0.382  0.70513
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1546 on 31 degrees of freedom
Multiple R-squared: 0.7858, Adjusted R-squared: 0.6753
F-statistic: 7.109 on 16 and 31 DF,  p-value: 1.810e-06

From SPSS I got
Tests of Between-Subjects Effects




 Dependent Variable:C Mean




 Source Type III Sum of Squares df Mean Square F Sig.  Corrected Model
2.719E+08 16 1.700E+07 7.109 .000  Intercept 2.394E+07 1 2.394E+07 10.012
.003  Meanrichness 4.979E+07 1 4.979E+07 20.827 .000  Diversity 3.581E+07 1
3.581E+07 14.978 .001  Zoop 1.079E+07 1 1.079E+07 4.515 .042  Diversity *
Zoop 261789.172 1 261789.172 .110 .743  Phyto(Diversity) 1.186E+08 6
1.976E+07 8.265 .000  Phyto * Zoop(Diversity) 6.171E+07 6 1.029E+07 4.302
.003  Error 7.411E+07 31 2.391E+06

 Total 7.959E+08 48


 Corrected Total 3.460E+08 47




Which, gives some similar results, but a completely different F statistic
and P-value for the main effect of Zoop and the nested effect of Phyto.
Obviously SPSS is not necessarily the perfect reference, but when using the
Type I SS, the results did agree. Any thoughts on why this might be? Could
the two programs be calculating the Type III SS differently? Might it be
wise to stick to Type I SS?

Thanks very much for your time and effort. It has been very helpful.
Anita.

On Thu, Jun 3, 2010 at 4:25 PM, Joris Meys jorism...@gmail.com wrote:

 I see where my confusion comes from. I counted 4 levels of Phyto, but
 you have 8, being 4 in every level of Diversity. There's your
 aliasing.

  table(Diversity,Phyto)
  Phyto
 Diversity M1 M2 M3 M4 P1 P2 P3 P4
 H  0  0  0  0  6  6  6  6
 L  6  6  6  6  0  0  0  0

 There's no need to code them differently for every level of Diversity.
 If you don't, all is fine :

  Phyto - gsub(M,P,as.character(Phyto))
  Phyto - as.factor(Phyto)
 
  test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
 + Zoop*Diversity/Phyto)
 
  Anova(test,type=III)
 Anova Table (Type III tests)

 Response: C.Mean
 Sum Sq Df F valuePr(F)
 (Intercept)   23935609  1 10.0121 0.0034729 **
 Mean.richness 49790385  1 20.8269 7.471e-05 ***
 Diversity 35807205  1 14.9779 0.0005234 ***
 Zoop  10794614  1  4.5153 0.0416688 *
 Diversity:Phyto  118553464  6  8.2650 2.184e-05 ***
 Diversity:Zoop  261789  1  0.1095 0.7429356
 Diversity:Zoop:Phyto  61710162  6  4.3021 0.0028790 **
 Residuals  

Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Joris Meys
SPSS uses a different calculation. As far as I understood, they test main
effects without the covariate. Regarding the difference between my and your
results, did you use sum contrasts?
options(contrasts=c(contr.sum,contr.poly))

On Fri, Jun 4, 2010 at 2:19 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hi Joris,
 That seems to have worked and the contrasts look correct.
 I have tried comparing the results to what SPSS produces for the same
 model. The two programs produce very different results, although the model F
 statistics, R squared and adjusted R squared values are identical. The
 results are so different that I don't know what to trust.

 For the same model you coded I got:

 test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
 + Zoop*Diversity/Phyto)
  Anova(test,type=III)
 Anova Table (Type III tests)

 Response: C.Mean
Sum Sq Df F valuePr(F)
 (Intercept)  28223311  1 11.8056  0.001701 **
 Mean.richness49790403  1 20.8269 7.471e-05 ***
 Diversity31055477  1 12.9903  0.001082 **
 Zoop  2736238  1  1.1445  0.292953
 Diversity:Phyto  27943313  6  1.9481  0.104103
 Diversity:Zoop 168184  1  0.0703  0.792584
 Diversity:Zoop:Phyto 61710145  6  4.3021  0.002879 **
 Residuals74110911 31
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 (Also sightly different from your result)

 and

  summary(test)

 Call:
 lm(formula = C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto +
 +Zoop * Diversity/Phyto)

 Residuals:
  Min   1Q   Median   3Q  Max
 -3555.26  -479.5349.94   423.49  4073.20

 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)   -8562.9 2492.2  -3.436  0.00170 **
 Mean.richness  4605.7 1009.2   4.564 7.47e-05 ***
 DiversityL 6576.9 1824.8   3.604  0.00108 **
 ZoopD -1414.4 1322.1  -1.070  0.29295
 DiversityH:PhytoP2-4307.5 1824.8  -2.361  0.02472 *
 DiversityL:PhytoP2 -268.4 1262.5  -0.213  0.83300
 DiversityH:PhytoP3-2233.4 1393.0  -1.603  0.11900
 DiversityL:PhytoP3-1571.4 1262.5  -1.245  0.22257
 DiversityH:PhytoP4-7914.8 2647.2  -2.990  0.00543 **
 DiversityL:PhytoP4-1612.8 1262.5  -1.277  0.21092
 DiversityL:ZoopD484.9 1828.0   0.265  0.79258
 DiversityH:ZoopD:PhytoP2683.9 1855.3   0.369  0.71493
 DiversityL:ZoopD:PhytoP2   6346.4 1785.4   3.555  0.00124 **
 DiversityH:ZoopD:PhytoP3   4922.8 1786.3   2.756  0.00971 **
 DiversityL:ZoopD:PhytoP3   1085.4 1785.4   0.608  0.54766
 DiversityH:ZoopD:PhytoP4   3261.8 1985.6   1.643  0.11055
 DiversityL:ZoopD:PhytoP4681.9 1785.4   0.382  0.70513
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 1546 on 31 degrees of freedom
 Multiple R-squared: 0.7858, Adjusted R-squared: 0.6753
 F-statistic: 7.109 on 16 and 31 DF,  p-value: 1.810e-06

 From SPSS I got
 Tests of Between-Subjects Effects




  Dependent Variable:C Mean




  Source Type III Sum of Squares df Mean Square F Sig.  Corrected Model
 2.719E+08 16 1.700E+07 7.109 .000  Intercept 2.394E+07 1 2.394E+07 10.012
 .003  Meanrichness 4.979E+07 1 4.979E+07 20.827 .000  Diversity 3.581E+07
 1 3.581E+07 14.978 .001  Zoop 1.079E+07 1 1.079E+07 4.515 .042  Diversity
 * Zoop 261789.172 1 261789.172 .110 .743  Phyto(Diversity) 1.186E+08 6
 1.976E+07 8.265 .000  Phyto * Zoop(Diversity) 6.171E+07 6 1.029E+07 4.302
 .003  Error 7.411E+07 31 2.391E+06

  Total 7.959E+08 48


  Corrected Total 3.460E+08 47




 Which, gives some similar results, but a completely different F statistic
 and P-value for the main effect of Zoop and the nested effect of Phyto.
 Obviously SPSS is not necessarily the perfect reference, but when using the
 Type I SS, the results did agree. Any thoughts on why this might be? Could
 the two programs be calculating the Type III SS differently? Might it be
 wise to stick to Type I SS?

 Thanks very much for your time and effort. It has been very helpful.
 Anita.


 On Thu, Jun 3, 2010 at 4:25 PM, Joris Meys jorism...@gmail.com wrote:

 I see where my confusion comes from. I counted 4 levels of Phyto, but
 you have 8, being 4 in every level of Diversity. There's your
 aliasing.

  table(Diversity,Phyto)
  Phyto
 Diversity M1 M2 M3 M4 P1 P2 P3 P4
 H  0  0  0  0  6  6  6  6
 L  6  6  6  6  0  0  0  0

 There's no need to code them differently for every level of Diversity.
 If you don't, all is fine :

  Phyto - gsub(M,P,as.character(Phyto))
  Phyto - as.factor(Phyto)
 
  test - lm(C.Mean~  Mean.richness + Diversity + Zoop + Diversity/Phyto +
 + Zoop*Diversity/Phyto)
 
  Anova(test,type=III)
 Anova Table (Type III tests)

 Response: C.Mean
 Sum Sq Df F valuePr(F)
 (Intercept)   

Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-03 Thread Anita Narwani
Yes I understood the strangeness of removing a main effect without
interactions that contain it because I did this during my efforts using
model simplification. I had checked out the link you sent a couple of days
ago. It was useful. So does Type II SS remove both the factor and any
interactions containing it when comparing models? i.e. to test for the main
effect of B you compare A + B + A:B against A?

On Thu, Jun 3, 2010 at 4:59 PM, Joris Meys jorism...@gmail.com wrote:

 Hi Anita,

 I have to correct myself too, I've been rambling a bit. Off course you
 don't delete the variable out of the interaction term when you test the main
 effect. What I said earlier didn't really make any sense.

 That testing a main effect without removing the interaction term is has a
 tricky interpretation. By removing a main effect you test full model  A + B
 + A:B against the model A + A:B.  If you remove the main effect Zoop for
 example, you basically nest Zoop within Diversity and test whether that's
 not worse than the full model. This explains it very well:

 https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html

 I'd go for type II, but you're free to test any hypothesis you want.

 Cheers
 Joris



 On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote:

 Thanks for your response Joris.

 I was aware of the potential for aliasing, although I thought that this
 was only a problem when you have missing cell means. It was interesting to
 read the vehement argument regarding the Type III sums of squares, and
 although I knew that there were different positions on the topic, I had no
 idea how divisive it was. Nevertheless, Type III SS are generally
 recommended by statistical texts in ecology for my type of experimental
 design. Interestingly, despite the aliasing, SPSS has no problems
 calculating Type III SS for this data set. This is simply because I am
 entering a co-variate, which causes non-orthogonality. I would be happier
 using R and the default Type I SS, which are the same as the Type III SS
 anyway when I omit the co-variate of Mean.richness, except that these
 results are very sensitive to the order in which I add the variables into
 the model when I do enter the co-variate. I understand that the order is
 very important relates back to the scientific hypothesis, but I am equally
 interested in the main effects of Zoop, Diversity, and the nested effect of
 Phyto, so entering either of these variables before the other does not make
 sense from an ecological perspective, and because the results do change, the
 order cannot be ignored from a statistical perspective.
 Finally, I have tried using the Type II SS and received similar warnings.

 Do you have a recommendations?
 Anita.




 --
 Joris Meys
 Statistical Consultant


 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-02 Thread Joris Meys
That's what one would expect with type III sum of squares. You have Phyto
twice in your model, but only as a nested factor. To compare the full model
with a model without diversity of zoop, you have either the combination
diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

Depending on how you stand on type III sum of squares, you could call that a
bug. Personally, I'd just not use them.

https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

Cheers
Joris


On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto
 +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
 Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0  -1.49e-0874110911
 718

 Diversity:Zoop:Phyto 6  61710145135821055
 735



 There are zero degrees of freedom for Diversity, Zoop and their
 interaction,
 and zero sums of sq for Diversity and Zoop. This cannot be correct, however
 when I do the model simplification by dropping terms from the models
 manually and comparing them using anova(), I get virtually the same
 results.



 I would appreciate any suggestions for things to try or pointers as to what
 I may be doing incorrectly.



 Thank you.

 Anita Narwani.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA with covariate using Type III sums of squares

2010-06-02 Thread Joris Meys
that's diversity/phyto, zoop or phyto twice in the formula.

On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote:

 That's what one would expect with type III sum of squares. You have Phyto
 twice in your model, but only as a nested factor. To compare the full model
 with a model without diversity of zoop, you have either the combination
 diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing.

 Depending on how you stand on type III sum of squares, you could call that
 a bug. Personally, I'd just not use them.

 https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html

 Cheers
 Joris


 On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote:

 Hello,

 I have been trying to get an ANOVA table for a linear model containing a
 single nested factor, two fixed factors and a covariate:

  carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto
 +
 Zoop*Diversity/Phyto)



 where, *Mean.richness* is a covariate*, Zoop* is a categorical variable
 (the
 species), *Diversity* is a categorical variable (Low or High), and
 *Phyto*(community composition) is also categorical but is nested
 within the level
 of *Diversity*. Quinn  Keough's statistics text recommends using Type III
 SS for a nested ANOVA with a covariate.

 I get the following output using the Type I SS ANOVA:



 Analysis of Variance Table
 Response: C.Mean
DfSum Sq
 Mean
 Sq  F valuePr(F)
 Mean.richness1  5638532656385326
 23.5855   3.239e-05 ***
 Diversity 1  14476593
  14476593
  6.0554 0.019634 *
 Zoop1  13002135
 13002135
  5.4387 0.026365 *
 Diversity:Phyto  6  126089387  21014898
 8.7904 1.257e-05 ***
 Diversity:Zoop   1  263036
 263036
 0.1100  0.742347
 Diversity:Zoop:Phyto 6  6171014510285024
 4.3021
   0.002879 **
 Residuals3174110911
 2390675

 I have tried using both the drop1() command and the Anova() command in the
 car package.

 When I use the Anova command I get the following error message:

 Anova(carbonmean,type=III)

 “Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,:
 One
 or more terms aliased in model.”



 I am not sure why this is aliased. There are no missing cells, and the
 cells
 are balanced (aside from for the covariate). Each Phyto by Zoop cross is
 replicated 3 times, and there are four Phyto levels within each level of
 Diversity. When I remove the nested factor (Phyto), I am able to get the
 Type III SS output.



 Then when I use drop1(carbonmean,.~.,Test=”F”) I get the following output:

  drop1(carbonmean,.~.,Test=F)

 Single term deletions



 Model:

 C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop *
 Diversity/Phyto

DfSum of Sq
 RSS AIC

 none74110911   718

 Mean.richness1  49790403123901314
 741

 Diversity 0 0
 74110911718

 Zoop0 0
 74110911718

 Diversity:Phyto  6  118553466  192664376
 752

 Diversity:Zoop   0  -1.49e-0874110911
 718

 Diversity:Zoop:Phyto 6  61710145135821055
 735



 There are zero degrees of freedom for Diversity, Zoop and their
 interaction,
 and zero sums of sq for Diversity and Zoop. This cannot be correct,
 however
 when I do the model simplification by dropping terms from the models
 manually and comparing them using anova(), I get virtually the same
 results.



 I would appreciate any suggestions for things to try or pointers as to
 what
 I may be doing incorrectly.



 Thank you.

 Anita Narwani.

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical Consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 Coupure Links 653
 B-9000 Gent

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control


Re: [R] Nested ANOVA residuals error

2009-07-16 Thread Bert Gunter
Hint: You have 27 observations fit by 27 fixed effects (including the mean).


You need to consult a statistician, as you seem not to have the basic
statistical understanding required. Person should be a random effect. Was
this a homework problem, perchance?

Bert Gunter
Genentech Nonclinical Biostatistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Jeff DaCosta
Sent: Thursday, July 16, 2009 12:37 PM
To: r-help@r-project.org
Subject: [R] Nested ANOVA residuals error

I am having trouble setting up a nested anova model.  Here is a  
truncated version of my data:

  data
  Ind Treatment PC1
1  PER14SC  1.14105282
2  PER14SH  1.45348615
3  PER14AC  2.45096904
4  PER25SC  1.23687887
5  PER25SH  4.54797450
6  PER25AC -0.73391131
7  PER30SC  1.15899433
8  PER30SH  2.83273736
9  PER30AC -0.86272186
10 PER31SC  1.90495285
11 PER31SH  5.68514429
12 PER31AC -2.25110314
13 PER35SC  2.16463279
14 PER35SH  1.69053389
15 PER35AC  1.87528942
16 PER36SC -2.44795781
17 PER36SH -3.59415298
18 PER36AC  1.90756192
19 PER41SC -2.83086573
20 PER41SH  4.26803170
21 PER41AC  2.42548862
22 PER48SC -1.86403368
23 PER48SH -0.08713013
24 PER48AC -1.98056853
25 PER52SC  2.35012436
26 PER52SH -2.19384500
27 PER52AC  1.38686223

  str(data)
'data.frame':   27 obs. of  3 variables:
  $ Ind  : Factor w/ 9 levels PER14,PER25,..: 1 1 1 2 2 2 3 3  
3 4 ...
  $ Treatment: Factor w/ 3 levels AC,SC,SH: 2 3 1 2 3 1 2 3 1  
2 ...
  $ PC1  : num  1.14 1.45 2.45 1.24 4.55 ...

PC1 is my response variable, and Treatment is my factor with three  
levels.  Each  of the nine individuals (Ind) in the study was tested  
for all three factor levels, so I want to run the model with Ind  
nested in Treatment.  I think that I want to set up the model like this:

  m1-aov(PC1~Treatment/Ind))

but I get the following error with anova(m1), and am not sure why I  
get zero for all the residuals.
  anova(m1)
Analysis of Variance Table

Response: PC1
   Df  Sum Sq Mean Sq F value Pr(F)
Treatment  2   9.215   4.607
Treatment:Ind 24 142.140   5.923
Residuals  0   0.000
Warning message:
In anova.lm(m1) :
   ANOVA F-tests on an essentially perfect fit are unreliable

Any input would be appreciated, thanks for your time and consideration.
-Jeff
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA residuals error

2009-07-16 Thread Daniel Malter
It is because the nesting structure perfectly explains the data (i.e., there
is only one observation and, therefore, no variation for each Ind in each
Treatment).

e=rnorm(90,0,1)
x=rep(1:3,30)
y=rep(1:30,each=3)
z=x+y+e

ano=aov(z~factor(y)/factor(x))
ano
residuals(ano)

Best,
Daniel


-
cuncta stricte discussurus
-

-Ursprüngliche Nachricht-
Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im
Auftrag von Jeff DaCosta
Gesendet: Thursday, July 16, 2009 3:37 PM
An: r-help@r-project.org
Betreff: [R] Nested ANOVA residuals error

I am having trouble setting up a nested anova model.  Here is a truncated
version of my data:

  data
  Ind Treatment PC1
1  PER14SC  1.14105282
2  PER14SH  1.45348615
3  PER14AC  2.45096904
4  PER25SC  1.23687887
5  PER25SH  4.54797450
6  PER25AC -0.73391131
7  PER30SC  1.15899433
8  PER30SH  2.83273736
9  PER30AC -0.86272186
10 PER31SC  1.90495285
11 PER31SH  5.68514429
12 PER31AC -2.25110314
13 PER35SC  2.16463279
14 PER35SH  1.69053389
15 PER35AC  1.87528942
16 PER36SC -2.44795781
17 PER36SH -3.59415298
18 PER36AC  1.90756192
19 PER41SC -2.83086573
20 PER41SH  4.26803170
21 PER41AC  2.42548862
22 PER48SC -1.86403368
23 PER48SH -0.08713013
24 PER48AC -1.98056853
25 PER52SC  2.35012436
26 PER52SH -2.19384500
27 PER52AC  1.38686223

  str(data)
'data.frame':   27 obs. of  3 variables:
  $ Ind  : Factor w/ 9 levels PER14,PER25,..: 1 1 1 2 2 2 3 3  
3 4 ...
  $ Treatment: Factor w/ 3 levels AC,SC,SH: 2 3 1 2 3 1 2 3 1
2 ...
  $ PC1  : num  1.14 1.45 2.45 1.24 4.55 ...

PC1 is my response variable, and Treatment is my factor with three levels.
Each  of the nine individuals (Ind) in the study was tested for all three
factor levels, so I want to run the model with Ind nested in Treatment.  I
think that I want to set up the model like this:

  m1-aov(PC1~Treatment/Ind))

but I get the following error with anova(m1), and am not sure why I get zero
for all the residuals.
  anova(m1)
Analysis of Variance Table

Response: PC1
   Df  Sum Sq Mean Sq F value Pr(F)
Treatment  2   9.215   4.607
Treatment:Ind 24 142.140   5.923
Residuals  0   0.000
Warning message:
In anova.lm(m1) :
   ANOVA F-tests on an essentially perfect fit are unreliable

Any input would be appreciated, thanks for your time and consideration.
-Jeff
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested ANOVA with fixed and random variables missing data

2008-09-08 Thread Adam D. I. Kramer


On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote:


I have a nested ANOVA, with a fixed factor tmt nested within site
(random). There are missing values in the data set.

aeucs, tmt and site have been defined as objects

I have tried:

model1=lme(aeucs~tmt,random=~1|tmt/site)


I think you want lme(aeucs~tmt,random=~tmt|site)

...as tmt is a fixed factor, you don't want to use it for grouping. Instead,
you want to estimate a separate tmt effect for each site...so estimate (~) a
fixed tmt effect (tmt) within (|) sites (site) = ~tmt|site.

--Adam



I get the following error message

Error in na.fail.default(list(aeucs = c(0.8, 1, 1, 1, 1, 1, 
0.7,  :
 missing values in object

I need to know where I am going wrong.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nested anova and multiple comparisons

2008-04-26 Thread Kingsford Jones
Hi Stephen,

On Sat, Apr 26, 2008 at 10:29 AM, Stephen Cole [EMAIL PROTECTED] wrote:
...snip...

  I have managed to accomplish my first two goals by analyzing the data
  as a 3 level nested Anova with the following code

  mod1 - aov(Count~Region/Location/Site, data=data)

  This allows me to get the MS for a Anova table.   However R does not
  compute the correct F-statistics (because it uses the residual MS for
  the denominator in all calculations which it shouldnt) so i manually
  computed the F-stats and variance components by hand.


R's just doing what it was asked it to do.  The fact that it used the
residual MS isn't a surprise given your code.

  From reading the help guide i learned about and tried using the
  Error(Region/Location/Site) command but then i can only get MS and no
  F-statistics and still hand to compute the rest by hand.

Yes, if you want to use Method-of-Moments estimation to solve for
expected mean squares for the variance components and calculate
F-stats w/ the 'correct' MS, then then specifying error strata is
reasonable for balanced data (assuming you've met model assumptions).
However, I believe most statisticians these days are more inclined to
use (Restricted) Maximum Likelihood estimation (e.g. using lme or
lmer) because of the added fliexibility it provides (also it won't
produce negative variance estimates when the mean squared error is
larger than the mean square between the groups of interest)

As for why you are only getting MS and no F-stats, it's hard to say
without a reproducible example (see the posting guide).  In my
experience this will occur when there are insufficient degrees of
freedom for calculating an F-stat at a given level.


  My problem now is that i would liek to use TukeyHSD for multiple
  comparisons.  Howeber since R is unable to compute the correct F
  statistics in this nested design i also think it is using the wrong MS
  and df in calculating tukeys q. for example when i use


Again, it's not that R is unable to, it's that you asked R not to.

  TukeyHSD(mod1, Region)  i will get values however i do not think
  they have been calculated correctly.

  Furthermore when i use the Error(Region/Location/Site) term i can then
  no longer use TukeyHSD as i get the error message that there is no
  applicable use of tukey on this object.

Looking at the methods available for TukeyHSD shows that it will work
for an object of class aov

 methods(TukeyHSD)
[1] TukeyHSD.aov

But ?aov states:

Value:

 An object of class 'c(aov, lm)' or for multiple responses of
 class 'c(maov, aov, mlm, lm)' or for multiple error strata
 of class 'aovlist'.  There are 'print' and 'summary' methods
 available for these.

So your model with error strata is of class aovlist not aov.


  i am just wondering if there is any way to use Multiple comparisons
  with R in a nested design like mine.


I'm not sure but the package 'multcomp' might be of help.

  I have thought about using lme or lmer but if i understand them right
  with a balanced design i should be able to get the same result using
  aov


Even with balanced data, you don't get the exact same thing with aov.
As mentioned above lme and lmer use different estimation methods.  One
of the advantages of using  (RE)ML is a lot more flexibility in how
you specifiy the structure of random effects covariance matrices, and
(at least with lme) you have a lot of flexibility in how you structure
the residual covariance matrix as well.  This is particularly useful
when you are not meeting assumptions of constant variance and/or
independence of observations (which seems to be the rule rather than
the exception with ecological data with a spatial component, such as
you appear to have).

The theory behind analyses you want to do are quite complex with many
potential pitfalls.  If at all possible, I highly recommend finding
the help of someone who is an expert in these types of models (known
as mixed, hierarchical, multilevel, random effects, variance
components, nested ANOVA, etc).

best,

Kingsford Jones


  Thanks

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested ANOVA models in R

2008-02-05 Thread Bill.Venables
Here is a short example:

 dat - data.frame(x = rnorm(20), 
a = rep(letters[1:4], 5), 
b = rep(letters[1:5], each = 4))
 summary(aov(x ~ a*b, dat))
Df  Sum Sq Mean Sq
a3  0.8021  0.2674
b4  3.7175  0.9294
a:b 12 10.5416  0.8785
 summary(aov(x ~ a/b, dat))
Df  Sum Sq Mean Sq
a3  0.8021  0.2674
a:b 16 14.2590  0.8912
  

So in your nested case you should not get a mean square for 'Female' at
all.  The interaction sum of squares in the nested case is the sum of
the main effect and interaction in the crossed model case, (as are the
degrees of freedom).

Although you think of them as different models, in a mathematical sense
they are equivalent - you just parcel the degrees of freedom and SSQ a
bit differently in the sequential anova.

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Daniel Bolnick
Sent: Wednesday, 6 February 2008 2:28 PM
To: r-help@r-project.org
Subject: [R] Nested ANOVA models in R

Hi,

I'm trying to work through a Nested ANOVA for the following scenario: 
20 males were used to fertilize eggs of 4 females per male, so that 
female is nested within male (80 females used total). Spine length 
was measured on 11 offspring per family, resulting in 880 
measurements on 80 families.

I used the following two commands:

summary(aov(Spinelength ~ Male*Female))

and
summary(aov(Spinelength ~ Male/Female))

I get the same mean squares either way, which doesn't seem right to 
me. In the former case, the mean square for females should be 
calculated around the overall mean across all females, whereas the 
mean square in the latter case should be calculated using deviations 
from the set of 4 females nested within a given male, right?

Of course, it is more appropriate for me to treat each of these as 
random effects. I know Bates has objections to the SAS-style 
partitioning variances to obtain F statistics and p-values, and I 
have read relevant parts of Pinhero and Bates, but how can a specify 
a nested random effects model that yields p-values for both the males 
(tested against MS for females) and females nested within males?

Thanks,

Dan Bolnick
Section of Integrative Biology
University of Texas at Austin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.