Re: [R] Nested ANOVA yields surprising results
> On 30 Oct 2015, at 18:46 , Daniel Wagenaarwrote: > > Dear R users: > > All textbook references that I consult say that in a nested ANOVA (e.g., > A/B), the F statistic for factor A should be calculated as > > F_A = MS_A / MS_(B within A). > That would depend on which hypothesis you test in which model. If a reference tells you that you "should" do something without specifying the model, then you "should" look at a different reference. In general, having anything other than the residual MS in the denominator indicates that you think it represents an additional source of random variation. I don't think that is invariably the case in nested designs (and, by the way, notice that "nested" is used differently by different books and software). If you don't say otherwise, R assumes that there is only one source of random variation the model - a single error term if you like - and that all other terms represent systematic variations. In this mode of thinking, an A:B term represents an effect of B within A (additive and interaction effects combined), and you can test for its presence by comparing MS_A:B to MS_res. In its absence, you might choose to reduce the model and next look for an effect of A; purists would do this by comparing MS_A to the new MS_res obtained by pooling MS_A:B and MS_res, but lazy statisticians/programmers have found it more convenient to stick with the original MS_res denominator throughout (to get the pooling done, just fit the reduced model). If you want A:B to be a random term, then you need to say so, e.g. using > summary(aov(Y~A+Error(A:B-1))) Error: A:B Df Sum Sq Mean Sq F value Pr(>F) A2 0.4735 0.2367 0.4030.7 Residuals3 1.7635 0.5878 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 6 4.993 0.8322 (the -1 in the Error() term prevents an error message, which as far as I can tell is spurious). Notice that you need aov() for this; lm() doesn't do Error() terms. This _only_ works in balanced designs. -pd > But when I run this simple example: > > set.seed(1) > A <- factor(rep(1:3, each=4)) > B <- factor(rep(1:2, 3, each=2)) > Y <- rnorm(12) > anova(lm(Y ~ A/B)) > > I get this result: > > Analysis of Variance Table > > Response: Y >Df Sum Sq Mean Sq F value Pr(>F) > A 2 0.4735 0.23675 0.2845 0.7620 > A:B3 1.7635 0.58783 0.7064 0.5823 > Residuals 6 4.9931 0.83218 > > Evidently, R calculates the F value for A as MS_A / MS_Residuals. > > While it is straightforward enough to calculate what I think is the correct > result from the table, I am surprised that R doesn't give me that answer > directly. Does anybody know if R's behavior is intentional, and if so, why? > Equally importantly, is there a straightforward way to make R give the answer > I expect, that is: > > Df Sum Sq Mean Sq F value Pr(>F) > A 2 0.4735 0.23675 0.4028 0.6999 > > The students in my statistics class would be much happier if they didn't have > to type things like > > a <- anova(...) > F <- a$`Sum Sq`[1] / a$`Sum Sq`[2] > P <- 1 - pf(F, a$Df[1], a$Df[2]) > > (They are not R programmers (yet).) And to be honest, I would find it easier > to read those results directly from the table as well. > > Thanks, > > Daniel Wagenaar > > -- > Daniel A. Wagenaar, PhD > Assistant Professor > Department of Biological Sciences > McMicken College of Arts and Sciences > University of Cincinnati > Cincinnati, OH 45221 > Phone: +1 (513) 556-9757 > Email: daniel.wagen...@uc.edu > Web: http://www.danielwagenaar.net > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested anova-R chrashing
joerg stephan stephanj at rhrk.uni-kl.de writes: Hi, I tried to do a nested Anova with the attached Data. My response variable is survivors and I would like to know the effect of (insect-egg clutch) size, position (of clutch on twig) and clone (/plant genotype) on the survival of eggs (due to predation). Each plant was provided with three different sizes of clutches (45,15,5) and had pseudo-replications of size 15 and 5 on it. This may not be what you wanted at all, but here is my take on what you could try with these data. You have binomial data with lots of zeros and ones (i.e. 0 or 100% survival), so they are unlikely to be transformable to normality however you try. You *might* be able to get away with modeling these as normally distributed and then rely on permutation tests to get your significance levels right, but (tricky as it is) I actually think GLMMs (Zuur et al 2009, Bolker et al 2009) are the best way to go ... ## read in the data x - read.table(aovmisc.dat,header=TRUE) ## take a look summary(x) ## with(x,table(clone,as.factor(size),as.factor(position))) ## make categorical versions of size plant variables, ## and reconstitute the total number surviving variable x - transform(x, fsize=factor(size), fplant=factor(plant), tsurv=round(survivors*size) ) library(ggplot2) ## trying to plot everything ... zspace - opts(panel.margin=unit(0,lines)) ## squash panels together theme_update(theme_bw()) ## white background ## plot all data: survival vs position, with all plants represented; ## separate subplots for each clone ## overlay GLM fit ggplot(x,aes(x=position,y=survivors,size=size,colour=fplant))+ geom_point(alpha=0.5)+ facet_wrap(~clone)+xlim(0,15)+geom_line(size=0.5,alpha=0.2)+zspace+ geom_smooth(aes(group=1,weight=size), colour=black,method=glm,family=binomial) ## it looks like there is a positive effect of position, and also ## a positive effect of size (large bubbles toward high-survival) ## there are a few outliers in 'position' -- are these typos? subset(x,position15) ## focus on size and clone instead, suppress position: ggplot(x,aes(x=fsize,y=survivors,colour=clone))+stat_sum(aes(size=..n..))+ facet_grid(.~clone)+ geom_smooth(method=glm,family=binomial,aes(weight=size, x=as.numeric(fsize))) ## OR ## + stat_summary(fun.y=mean,geom=line,aes(x=as.numeric(fsize)))+zspace library(mgcv) ggplot(x,aes(x=position,y=survivors,colour=clone))+stat_sum(aes(size=..n..))+ facet_grid(fsize~clone)+ geom_smooth(method=gam,family=binomial)+xlim(0,15)+zspace ## appears to be an effect of size, and a clone*size interaction. ## possible bimodal distribution (0/1) at size=5? with(x,table(clone,plant)) ## each plant is only in one clone ## (explicit nesting) library(lme4) ## fit with size*clone interaction, main effect of position ## no need to nest plant in clone because they are explicitly nested ## (i.e. unique plant IDs) g1 - glmer(cbind(tsurv,size)~fsize*clone+position+(1|fplant), data=x, family=binomial) ## doesn't LOOK like overdispersion ... sum(residuals(g1)^2) ## residual df nrow(x)-length(fixef(g1)) pchisq(778,df=537) ## ... but try fitting observation-level random effect anyway x - transform(x,obs=seq(nrow(x))) g2 - update(g1,.~.+(1|obs)) summary(g2) ## among-observation variance 2x among-plant variance anova(g1,g2) ## looks much better (10 AIC units lower / p 0.05 ## drop1(g2,test=Chisq) ## looks like we can drop the size*clone interaction? g3 - update(g2,.~.-clone:fsize) drop1(g3,test=Chisq) ## strong effects of size, position; ## weak effects of clone fixef(g3) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested anova
Hi Rick, Whenever I hear my instant association with post hoc and ANOVA would be ?TukeyHSD However, if you are not comfortable interpreting the model you ran, this suggests that you may benefit more from learning more statistical theory or finding someone to consult with who can help. You might also try graphing your data. Something like this would be one (of very many and not necessarily the best for your data) option: library(lattice) xyplot(catchrate ~ habitat | site, data = pat) xyplot(catchrate ~ site | Gear, data = pat) xyplot(catchrate ~ habitat | Gear, data = pat) I tend to prefer using regression and then either accepting the default contrasts or (if I have some theory/hypothesis) specifying my own contrast matrices and using those. However, it is not really feasible for members of this list to help you interpret your results or suggest appropriate statistical techniques without knowing your actual data. If you have specific questions related to R (e.g., How can I calculate some test or statistic? Here is a representative, but small sample dataset.), you will get a lot more advice and help. Best regards, Josh On Thu, Oct 21, 2010 at 4:13 PM, mirick miri...@yahoo.com wrote: Hello all, Can any of you R gurus help me out? I’m not all that great at stats to begin with, and I’m also learning the R ropes (former SAS user). Here’s what I need help with… I have a nested sample design and ran a nested anova, but I don’t know how to interpret the results habitat (four different types) is nested in site (three types), and site is nested in gear (two types) My code: pat2-aov(catchrate~habitat/site/gear, data=pat) This created the following outcome: Df Sum Sq Mean Sq F value Pr(F) habitat 3 2.932 0.9774 0.9543 0.4155656 habitat:site 8 18.716 2.3395 2.2842 0.0235207 habitat:site:Gear 12 39.244 3.2704 3.1930 0.0003546 Residuals 186 190.505 1.0242 What exactly does this outcome mean? It looks like there are differences between gears and sites, but not among habitats. Which gear is better, which site is better, which gear works better in each site, etc.? I’ve looked for some post hoc code to do this, but I can’t find anything and I am at wits end. Thanks, Rick M. -- View this message in context: http://r.789695.n4.nabble.com/nested-anova-tp3006469p3006469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested anova
Hi: On Thu, Oct 21, 2010 at 4:13 PM, mirick miri...@yahoo.com wrote: Hello all, Can any of you R gurus help me out? Im not all that great at stats to begin with, and Im also learning the R ropes (former SAS user). Sounds like you need a support group :) Heres what I need help with I have a nested sample design and ran a nested anova, but I dont know how to interpret the results habitat (four different types) is nested in site (three types), and site is nested in gear (two types) My code:pat2-aov(catchrate~habitat/site/gear, data=pat) This created the following outcome: Df Sum Sq Mean Sq F value Pr(F) habitat 32.932 0.9774 0.9543 0.4155656 habitat:site 818.7162.3395 2.28420.0235207 habitat:site:Gear 12 39.2443.2704 3.19300.0003546 Residuals 186 190.505 1.0242 What exactly does this outcome mean? It looks like there are differences between gears and sites, but not among habitats. Which gear is better, which site is better, which gear works better in each site, etc.? Ive looked for some post hoc code to do this, but I cant find anything and I am at wits end. Thanks, Rick M. Firstly, in a nested design, one often treats nested factors as random. You appear to want to treat them as fixed, which means that you are only interested in comparisons among the 24 habitats in your study, which are nested within the six sites which in turn are nested within the two types of gear. Is that correct? Secondly, the degrees of freedom allocation should clue you in that something is amiss, which would be your model specification. In R, nesting works top-down, so your model should be aov(catchrate ~ gear/site/habitat, data = pat) BTW, 186 df for residual? Do you have nine observations in each habitat? Are the data balanced within habitat (i.e., no missing data)? Your ANOVA should look like Gears 1 Gears/Sites 4 Gears/Sites/Habitat18 if your description is correct. According to your ANOVA, you have 210 observations. If you had nine observations per habitat, there would be 216 observations total, so is it reasonable to conclude you either have missing responses or unbalanced data within habitat? If so, how severe is the imbalance? Thirdly, as far as multiple comparisons go, you need to be very careful to use the correct variance estimate at each level. However, it seems to me that comparisons would only make sense within level (assuming they make sense at all); e.g., comparing the four habitats in a particular site, and repeating the comparison for each site. I'd consider investing some time deriving the means and variances of each set of level means in the hierarchy. That should help sort out some of your questions. For example, there is only one comparison of gears, and the two gear means are obtained by averaging the 105 or so observations they each contain. This averages out site and habitat effects, so asking which gear works better in each site is an impossible question to answer since each site is associated with exactly one gear (by the definition of nested factors). You can compare the three sites within a particular gear, but you can't compare sites across both gears, because each site is associated with only one gear. Similarly, each habitat is associated with exactly one site, and hence one gear. This is what I mean when suggesting that you sit down and work out the algebra, to understand which effects can be compared and which can't, along with an understanding of how the variances of the means at each level of the hierarchy are derived and how they differ. HTH, Dennis -- View this message in context: http://r.789695.n4.nabble.com/nested-anova-tp3006469p3006469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Dear Anita and Joris, Please see https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html, posted to r-help in March. Regards, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Anita Narwani Sent: June-03-10 8:49 PM To: Joris Meys Cc: r-help@r-project.org Subject: Re: [R] Nested ANOVA with covariate using Type III sums of squares Yes I understood the strangeness of removing a main effect without interactions that contain it because I did this during my efforts using model simplification. I had checked out the link you sent a couple of days ago. It was useful. So does Type II SS remove both the factor and any interactions containing it when comparing models? i.e. to test for the main effect of B you compare A + B + A:B against A? On Thu, Jun 3, 2010 at 4:59 PM, Joris Meys jorism...@gmail.com wrote: Hi Anita, I have to correct myself too, I've been rambling a bit. Off course you don't delete the variable out of the interaction term when you test the main effect. What I said earlier didn't really make any sense. That testing a main effect without removing the interaction term is has a tricky interpretation. By removing a main effect you test full model A + B + A:B against the model A + A:B. If you remove the main effect Zoop for example, you basically nest Zoop within Diversity and test whether that's not worse than the full model. This explains it very well: https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html I'd go for type II, but you're free to test any hypothesis you want. Cheers Joris On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote: Thanks for your response Joris. I was aware of the potential for aliasing, although I thought that this was only a problem when you have missing cell means. It was interesting to read the vehement argument regarding the Type III sums of squares, and although I knew that there were different positions on the topic, I had no idea how divisive it was. Nevertheless, Type III SS are generally recommended by statistical texts in ecology for my type of experimental design. Interestingly, despite the aliasing, SPSS has no problems calculating Type III SS for this data set. This is simply because I am entering a co-variate, which causes non-orthogonality. I would be happier using R and the default Type I SS, which are the same as the Type III SS anyway when I omit the co-variate of Mean.richness, except that these results are very sensitive to the order in which I add the variables into the model when I do enter the co-variate. I understand that the order is very important relates back to the scientific hypothesis, but I am equally interested in the main effects of Zoop, Diversity, and the nested effect of Phyto, so entering either of these variables before the other does not make sense from an ecological perspective, and because the results do change, the order cannot be ignored from a statistical perspective. Finally, I have tried using the Type II SS and received similar warnings. Do you have a recommendations? Anita. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
I would just like to add that when I remove the co-variate of Mean.richness from the model (i.e. eliminating the non-orthogonality), the aliasing warning is replaced by the following error message: Error in t(Z) %*% ip : non-conformable arguments That is when I enter this model: carbonmean-lm(C.Mean~ Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) On Wed, Jun 2, 2010 at 6:05 PM, Joris Meys jorism...@gmail.com wrote: that's diversity/phyto, zoop or phyto twice in the formula. On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote: That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0 -1.49e-0874110911 718 Diversity:Zoop:Phyto 6 61710145135821055 735 There are zero degrees of freedom for Diversity, Zoop and their interaction, and zero sums of sq for Diversity and Zoop. This cannot be correct, however when I do the model simplification by dropping terms from the models manually and comparing them using anova(), I get virtually the same results. I would appreciate any suggestions for things to try or pointers as to what I may be doing incorrectly. Thank you. Anita Narwani. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Could you copy the data? Data - data.frame(C.Mean,Mean.richness,Zoop,Diversity,Phyto) dput(Data) I have the feeling something's wrong there. I believe you have 48 observations (47df + 1 for the intercept), 2 levels of Diversity, 4 of Phyto and 48/(3*4)=4 levels of Zoop. But you don't have 3df for Zoop. Either I'm way off, or what goes in the lm is not what you think it is. I tried a small sample with the datastructure I believe you have, but I couldn't reproduce your error. ## Run Phyto - as.factor(rep(rep(c(A,B,C,D),each=6),2)) Diversity - as.factor(rep(c(High,Low),each=24)) Zoop - rep(c(1,2,3,4),times=12) C.Mean - rnorm(48) Mean.richness -rnorm(48) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) Anova(test,type=III) Zoop - as.factor(Zoop) Anova(test,type=III) ## End Run Cheers Joris On Thu, Jun 3, 2010 at 10:26 PM, Anita Narwani anitanarw...@gmail.comwrote: I would just like to add that when I remove the co-variate of Mean.richness from the model (i.e. eliminating the non-orthogonality), the aliasing warning is replaced by the following error message: Error in t(Z) %*% ip : non-conformable arguments That is when I enter this model: carbonmean-lm(C.Mean~ Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) On Wed, Jun 2, 2010 at 6:05 PM, Joris Meys jorism...@gmail.com wrote: that's diversity/phyto, zoop or phyto twice in the formula. On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote: That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403 123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0
Re: [R] Nested ANOVA with covariate using Type III sums of squares
I see where my confusion comes from. I counted 4 levels of Phyto, but you have 8, being 4 in every level of Diversity. There's your aliasing. table(Diversity,Phyto) Phyto Diversity M1 M2 M3 M4 P1 P2 P3 P4 H 0 0 0 0 6 6 6 6 L 6 6 6 6 0 0 0 0 There's no need to code them differently for every level of Diversity. If you don't, all is fine : Phyto - gsub(M,P,as.character(Phyto)) Phyto - as.factor(Phyto) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F value Pr(F) (Intercept) 23935609 1 10.0121 0.0034729 ** Mean.richness 49790385 1 20.8269 7.471e-05 *** Diversity 35807205 1 14.9779 0.0005234 *** Zoop 10794614 1 4.5153 0.0416688 * Diversity:Phyto 118553464 6 8.2650 2.184e-05 *** Diversity:Zoop 261789 1 0.1095 0.7429356 Diversity:Zoop:Phyto 61710162 6 4.3021 0.0028790 ** Residuals 74110938 31 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 You can check with summary(test) that the model is fitted correctly. On Fri, Jun 4, 2010 at 12:48 AM, Anita Narwani anitanarw...@gmail.com wrote: You have everything right except that there are only 2 zooplankton species (C D, which stand for Ceriodaphnia and Daphnia). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Hi Anita, I have to correct myself too, I've been rambling a bit. Off course you don't delete the variable out of the interaction term when you test the main effect. What I said earlier didn't really make any sense. That testing a main effect without removing the interaction term is has a tricky interpretation. By removing a main effect you test full model A + B + A:B against the model A + A:B. If you remove the main effect Zoop for example, you basically nest Zoop within Diversity and test whether that's not worse than the full model. This explains it very well: https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html I'd go for type II, but you're free to test any hypothesis you want. Cheers Joris On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote: Thanks for your response Joris. I was aware of the potential for aliasing, although I thought that this was only a problem when you have missing cell means. It was interesting to read the vehement argument regarding the Type III sums of squares, and although I knew that there were different positions on the topic, I had no idea how divisive it was. Nevertheless, Type III SS are generally recommended by statistical texts in ecology for my type of experimental design. Interestingly, despite the aliasing, SPSS has no problems calculating Type III SS for this data set. This is simply because I am entering a co-variate, which causes non-orthogonality. I would be happier using R and the default Type I SS, which are the same as the Type III SS anyway when I omit the co-variate of Mean.richness, except that these results are very sensitive to the order in which I add the variables into the model when I do enter the co-variate. I understand that the order is very important relates back to the scientific hypothesis, but I am equally interested in the main effects of Zoop, Diversity, and the nested effect of Phyto, so entering either of these variables before the other does not make sense from an ecological perspective, and because the results do change, the order cannot be ignored from a statistical perspective. Finally, I have tried using the Type II SS and received similar warnings. Do you have a recommendations? Anita. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Hi Joris, That seems to have worked and the contrasts look correct. I have tried comparing the results to what SPSS produces for the same model. The two programs produce very different results, although the model F statistics, R squared and adjusted R squared values are identical. The results are so different that I don't know what to trust. For the same model you coded I got: test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F valuePr(F) (Intercept) 28223311 1 11.8056 0.001701 ** Mean.richness49790403 1 20.8269 7.471e-05 *** Diversity31055477 1 12.9903 0.001082 ** Zoop 2736238 1 1.1445 0.292953 Diversity:Phyto 27943313 6 1.9481 0.104103 Diversity:Zoop 168184 1 0.0703 0.792584 Diversity:Zoop:Phyto 61710145 6 4.3021 0.002879 ** Residuals74110911 31 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Also sightly different from your result) and summary(test) Call: lm(formula = C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + +Zoop * Diversity/Phyto) Residuals: Min 1Q Median 3Q Max -3555.26 -479.5349.94 423.49 4073.20 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -8562.9 2492.2 -3.436 0.00170 ** Mean.richness 4605.7 1009.2 4.564 7.47e-05 *** DiversityL 6576.9 1824.8 3.604 0.00108 ** ZoopD -1414.4 1322.1 -1.070 0.29295 DiversityH:PhytoP2-4307.5 1824.8 -2.361 0.02472 * DiversityL:PhytoP2 -268.4 1262.5 -0.213 0.83300 DiversityH:PhytoP3-2233.4 1393.0 -1.603 0.11900 DiversityL:PhytoP3-1571.4 1262.5 -1.245 0.22257 DiversityH:PhytoP4-7914.8 2647.2 -2.990 0.00543 ** DiversityL:PhytoP4-1612.8 1262.5 -1.277 0.21092 DiversityL:ZoopD484.9 1828.0 0.265 0.79258 DiversityH:ZoopD:PhytoP2683.9 1855.3 0.369 0.71493 DiversityL:ZoopD:PhytoP2 6346.4 1785.4 3.555 0.00124 ** DiversityH:ZoopD:PhytoP3 4922.8 1786.3 2.756 0.00971 ** DiversityL:ZoopD:PhytoP3 1085.4 1785.4 0.608 0.54766 DiversityH:ZoopD:PhytoP4 3261.8 1985.6 1.643 0.11055 DiversityL:ZoopD:PhytoP4681.9 1785.4 0.382 0.70513 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1546 on 31 degrees of freedom Multiple R-squared: 0.7858, Adjusted R-squared: 0.6753 F-statistic: 7.109 on 16 and 31 DF, p-value: 1.810e-06 From SPSS I got Tests of Between-Subjects Effects Dependent Variable:C Mean Source Type III Sum of Squares df Mean Square F Sig. Corrected Model 2.719E+08 16 1.700E+07 7.109 .000 Intercept 2.394E+07 1 2.394E+07 10.012 .003 Meanrichness 4.979E+07 1 4.979E+07 20.827 .000 Diversity 3.581E+07 1 3.581E+07 14.978 .001 Zoop 1.079E+07 1 1.079E+07 4.515 .042 Diversity * Zoop 261789.172 1 261789.172 .110 .743 Phyto(Diversity) 1.186E+08 6 1.976E+07 8.265 .000 Phyto * Zoop(Diversity) 6.171E+07 6 1.029E+07 4.302 .003 Error 7.411E+07 31 2.391E+06 Total 7.959E+08 48 Corrected Total 3.460E+08 47 Which, gives some similar results, but a completely different F statistic and P-value for the main effect of Zoop and the nested effect of Phyto. Obviously SPSS is not necessarily the perfect reference, but when using the Type I SS, the results did agree. Any thoughts on why this might be? Could the two programs be calculating the Type III SS differently? Might it be wise to stick to Type I SS? Thanks very much for your time and effort. It has been very helpful. Anita. On Thu, Jun 3, 2010 at 4:25 PM, Joris Meys jorism...@gmail.com wrote: I see where my confusion comes from. I counted 4 levels of Phyto, but you have 8, being 4 in every level of Diversity. There's your aliasing. table(Diversity,Phyto) Phyto Diversity M1 M2 M3 M4 P1 P2 P3 P4 H 0 0 0 0 6 6 6 6 L 6 6 6 6 0 0 0 0 There's no need to code them differently for every level of Diversity. If you don't, all is fine : Phyto - gsub(M,P,as.character(Phyto)) Phyto - as.factor(Phyto) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F valuePr(F) (Intercept) 23935609 1 10.0121 0.0034729 ** Mean.richness 49790385 1 20.8269 7.471e-05 *** Diversity 35807205 1 14.9779 0.0005234 *** Zoop 10794614 1 4.5153 0.0416688 * Diversity:Phyto 118553464 6 8.2650 2.184e-05 *** Diversity:Zoop 261789 1 0.1095 0.7429356 Diversity:Zoop:Phyto 61710162 6 4.3021 0.0028790 ** Residuals
Re: [R] Nested ANOVA with covariate using Type III sums of squares
SPSS uses a different calculation. As far as I understood, they test main effects without the covariate. Regarding the difference between my and your results, did you use sum contrasts? options(contrasts=c(contr.sum,contr.poly)) On Fri, Jun 4, 2010 at 2:19 AM, Anita Narwani anitanarw...@gmail.comwrote: Hi Joris, That seems to have worked and the contrasts look correct. I have tried comparing the results to what SPSS produces for the same model. The two programs produce very different results, although the model F statistics, R squared and adjusted R squared values are identical. The results are so different that I don't know what to trust. For the same model you coded I got: test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F valuePr(F) (Intercept) 28223311 1 11.8056 0.001701 ** Mean.richness49790403 1 20.8269 7.471e-05 *** Diversity31055477 1 12.9903 0.001082 ** Zoop 2736238 1 1.1445 0.292953 Diversity:Phyto 27943313 6 1.9481 0.104103 Diversity:Zoop 168184 1 0.0703 0.792584 Diversity:Zoop:Phyto 61710145 6 4.3021 0.002879 ** Residuals74110911 31 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Also sightly different from your result) and summary(test) Call: lm(formula = C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + +Zoop * Diversity/Phyto) Residuals: Min 1Q Median 3Q Max -3555.26 -479.5349.94 423.49 4073.20 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -8562.9 2492.2 -3.436 0.00170 ** Mean.richness 4605.7 1009.2 4.564 7.47e-05 *** DiversityL 6576.9 1824.8 3.604 0.00108 ** ZoopD -1414.4 1322.1 -1.070 0.29295 DiversityH:PhytoP2-4307.5 1824.8 -2.361 0.02472 * DiversityL:PhytoP2 -268.4 1262.5 -0.213 0.83300 DiversityH:PhytoP3-2233.4 1393.0 -1.603 0.11900 DiversityL:PhytoP3-1571.4 1262.5 -1.245 0.22257 DiversityH:PhytoP4-7914.8 2647.2 -2.990 0.00543 ** DiversityL:PhytoP4-1612.8 1262.5 -1.277 0.21092 DiversityL:ZoopD484.9 1828.0 0.265 0.79258 DiversityH:ZoopD:PhytoP2683.9 1855.3 0.369 0.71493 DiversityL:ZoopD:PhytoP2 6346.4 1785.4 3.555 0.00124 ** DiversityH:ZoopD:PhytoP3 4922.8 1786.3 2.756 0.00971 ** DiversityL:ZoopD:PhytoP3 1085.4 1785.4 0.608 0.54766 DiversityH:ZoopD:PhytoP4 3261.8 1985.6 1.643 0.11055 DiversityL:ZoopD:PhytoP4681.9 1785.4 0.382 0.70513 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1546 on 31 degrees of freedom Multiple R-squared: 0.7858, Adjusted R-squared: 0.6753 F-statistic: 7.109 on 16 and 31 DF, p-value: 1.810e-06 From SPSS I got Tests of Between-Subjects Effects Dependent Variable:C Mean Source Type III Sum of Squares df Mean Square F Sig. Corrected Model 2.719E+08 16 1.700E+07 7.109 .000 Intercept 2.394E+07 1 2.394E+07 10.012 .003 Meanrichness 4.979E+07 1 4.979E+07 20.827 .000 Diversity 3.581E+07 1 3.581E+07 14.978 .001 Zoop 1.079E+07 1 1.079E+07 4.515 .042 Diversity * Zoop 261789.172 1 261789.172 .110 .743 Phyto(Diversity) 1.186E+08 6 1.976E+07 8.265 .000 Phyto * Zoop(Diversity) 6.171E+07 6 1.029E+07 4.302 .003 Error 7.411E+07 31 2.391E+06 Total 7.959E+08 48 Corrected Total 3.460E+08 47 Which, gives some similar results, but a completely different F statistic and P-value for the main effect of Zoop and the nested effect of Phyto. Obviously SPSS is not necessarily the perfect reference, but when using the Type I SS, the results did agree. Any thoughts on why this might be? Could the two programs be calculating the Type III SS differently? Might it be wise to stick to Type I SS? Thanks very much for your time and effort. It has been very helpful. Anita. On Thu, Jun 3, 2010 at 4:25 PM, Joris Meys jorism...@gmail.com wrote: I see where my confusion comes from. I counted 4 levels of Phyto, but you have 8, being 4 in every level of Diversity. There's your aliasing. table(Diversity,Phyto) Phyto Diversity M1 M2 M3 M4 P1 P2 P3 P4 H 0 0 0 0 6 6 6 6 L 6 6 6 6 0 0 0 0 There's no need to code them differently for every level of Diversity. If you don't, all is fine : Phyto - gsub(M,P,as.character(Phyto)) Phyto - as.factor(Phyto) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F valuePr(F) (Intercept)
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Yes I understood the strangeness of removing a main effect without interactions that contain it because I did this during my efforts using model simplification. I had checked out the link you sent a couple of days ago. It was useful. So does Type II SS remove both the factor and any interactions containing it when comparing models? i.e. to test for the main effect of B you compare A + B + A:B against A? On Thu, Jun 3, 2010 at 4:59 PM, Joris Meys jorism...@gmail.com wrote: Hi Anita, I have to correct myself too, I've been rambling a bit. Off course you don't delete the variable out of the interaction term when you test the main effect. What I said earlier didn't really make any sense. That testing a main effect without removing the interaction term is has a tricky interpretation. By removing a main effect you test full model A + B + A:B against the model A + A:B. If you remove the main effect Zoop for example, you basically nest Zoop within Diversity and test whether that's not worse than the full model. This explains it very well: https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html I'd go for type II, but you're free to test any hypothesis you want. Cheers Joris On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote: Thanks for your response Joris. I was aware of the potential for aliasing, although I thought that this was only a problem when you have missing cell means. It was interesting to read the vehement argument regarding the Type III sums of squares, and although I knew that there were different positions on the topic, I had no idea how divisive it was. Nevertheless, Type III SS are generally recommended by statistical texts in ecology for my type of experimental design. Interestingly, despite the aliasing, SPSS has no problems calculating Type III SS for this data set. This is simply because I am entering a co-variate, which causes non-orthogonality. I would be happier using R and the default Type I SS, which are the same as the Type III SS anyway when I omit the co-variate of Mean.richness, except that these results are very sensitive to the order in which I add the variables into the model when I do enter the co-variate. I understand that the order is very important relates back to the scientific hypothesis, but I am equally interested in the main effects of Zoop, Diversity, and the nested effect of Phyto, so entering either of these variables before the other does not make sense from an ecological perspective, and because the results do change, the order cannot be ignored from a statistical perspective. Finally, I have tried using the Type II SS and received similar warnings. Do you have a recommendations? Anita. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0 -1.49e-0874110911 718 Diversity:Zoop:Phyto 6 61710145135821055 735 There are zero degrees of freedom for Diversity, Zoop and their interaction, and zero sums of sq for Diversity and Zoop. This cannot be correct, however when I do the model simplification by dropping terms from the models manually and comparing them using anova(), I get virtually the same results. I would appreciate any suggestions for things to try or pointers as to what I may be doing incorrectly. Thank you. Anita Narwani. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
that's diversity/phyto, zoop or phyto twice in the formula. On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote: That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0 -1.49e-0874110911 718 Diversity:Zoop:Phyto 6 61710145135821055 735 There are zero degrees of freedom for Diversity, Zoop and their interaction, and zero sums of sq for Diversity and Zoop. This cannot be correct, however when I do the model simplification by dropping terms from the models manually and comparing them using anova(), I get virtually the same results. I would appreciate any suggestions for things to try or pointers as to what I may be doing incorrectly. Thank you. Anita Narwani. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control
Re: [R] Nested ANOVA residuals error
Hint: You have 27 observations fit by 27 fixed effects (including the mean). You need to consult a statistician, as you seem not to have the basic statistical understanding required. Person should be a random effect. Was this a homework problem, perchance? Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff DaCosta Sent: Thursday, July 16, 2009 12:37 PM To: r-help@r-project.org Subject: [R] Nested ANOVA residuals error I am having trouble setting up a nested anova model. Here is a truncated version of my data: data Ind Treatment PC1 1 PER14SC 1.14105282 2 PER14SH 1.45348615 3 PER14AC 2.45096904 4 PER25SC 1.23687887 5 PER25SH 4.54797450 6 PER25AC -0.73391131 7 PER30SC 1.15899433 8 PER30SH 2.83273736 9 PER30AC -0.86272186 10 PER31SC 1.90495285 11 PER31SH 5.68514429 12 PER31AC -2.25110314 13 PER35SC 2.16463279 14 PER35SH 1.69053389 15 PER35AC 1.87528942 16 PER36SC -2.44795781 17 PER36SH -3.59415298 18 PER36AC 1.90756192 19 PER41SC -2.83086573 20 PER41SH 4.26803170 21 PER41AC 2.42548862 22 PER48SC -1.86403368 23 PER48SH -0.08713013 24 PER48AC -1.98056853 25 PER52SC 2.35012436 26 PER52SH -2.19384500 27 PER52AC 1.38686223 str(data) 'data.frame': 27 obs. of 3 variables: $ Ind : Factor w/ 9 levels PER14,PER25,..: 1 1 1 2 2 2 3 3 3 4 ... $ Treatment: Factor w/ 3 levels AC,SC,SH: 2 3 1 2 3 1 2 3 1 2 ... $ PC1 : num 1.14 1.45 2.45 1.24 4.55 ... PC1 is my response variable, and Treatment is my factor with three levels. Each of the nine individuals (Ind) in the study was tested for all three factor levels, so I want to run the model with Ind nested in Treatment. I think that I want to set up the model like this: m1-aov(PC1~Treatment/Ind)) but I get the following error with anova(m1), and am not sure why I get zero for all the residuals. anova(m1) Analysis of Variance Table Response: PC1 Df Sum Sq Mean Sq F value Pr(F) Treatment 2 9.215 4.607 Treatment:Ind 24 142.140 5.923 Residuals 0 0.000 Warning message: In anova.lm(m1) : ANOVA F-tests on an essentially perfect fit are unreliable Any input would be appreciated, thanks for your time and consideration. -Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA residuals error
It is because the nesting structure perfectly explains the data (i.e., there is only one observation and, therefore, no variation for each Ind in each Treatment). e=rnorm(90,0,1) x=rep(1:3,30) y=rep(1:30,each=3) z=x+y+e ano=aov(z~factor(y)/factor(x)) ano residuals(ano) Best, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Jeff DaCosta Gesendet: Thursday, July 16, 2009 3:37 PM An: r-help@r-project.org Betreff: [R] Nested ANOVA residuals error I am having trouble setting up a nested anova model. Here is a truncated version of my data: data Ind Treatment PC1 1 PER14SC 1.14105282 2 PER14SH 1.45348615 3 PER14AC 2.45096904 4 PER25SC 1.23687887 5 PER25SH 4.54797450 6 PER25AC -0.73391131 7 PER30SC 1.15899433 8 PER30SH 2.83273736 9 PER30AC -0.86272186 10 PER31SC 1.90495285 11 PER31SH 5.68514429 12 PER31AC -2.25110314 13 PER35SC 2.16463279 14 PER35SH 1.69053389 15 PER35AC 1.87528942 16 PER36SC -2.44795781 17 PER36SH -3.59415298 18 PER36AC 1.90756192 19 PER41SC -2.83086573 20 PER41SH 4.26803170 21 PER41AC 2.42548862 22 PER48SC -1.86403368 23 PER48SH -0.08713013 24 PER48AC -1.98056853 25 PER52SC 2.35012436 26 PER52SH -2.19384500 27 PER52AC 1.38686223 str(data) 'data.frame': 27 obs. of 3 variables: $ Ind : Factor w/ 9 levels PER14,PER25,..: 1 1 1 2 2 2 3 3 3 4 ... $ Treatment: Factor w/ 3 levels AC,SC,SH: 2 3 1 2 3 1 2 3 1 2 ... $ PC1 : num 1.14 1.45 2.45 1.24 4.55 ... PC1 is my response variable, and Treatment is my factor with three levels. Each of the nine individuals (Ind) in the study was tested for all three factor levels, so I want to run the model with Ind nested in Treatment. I think that I want to set up the model like this: m1-aov(PC1~Treatment/Ind)) but I get the following error with anova(m1), and am not sure why I get zero for all the residuals. anova(m1) Analysis of Variance Table Response: PC1 Df Sum Sq Mean Sq F value Pr(F) Treatment 2 9.215 4.607 Treatment:Ind 24 142.140 5.923 Residuals 0 0.000 Warning message: In anova.lm(m1) : ANOVA F-tests on an essentially perfect fit are unreliable Any input would be appreciated, thanks for your time and consideration. -Jeff [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested ANOVA with fixed and random variables missing data
On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote: I have a nested ANOVA, with a fixed factor tmt nested within site (random). There are missing values in the data set. aeucs, tmt and site have been defined as objects I have tried: model1=lme(aeucs~tmt,random=~1|tmt/site) I think you want lme(aeucs~tmt,random=~tmt|site) ...as tmt is a fixed factor, you don't want to use it for grouping. Instead, you want to estimate a separate tmt effect for each site...so estimate (~) a fixed tmt effect (tmt) within (|) sites (site) = ~tmt|site. --Adam I get the following error message Error in na.fail.default(list(aeucs = c(0.8, 1, 1, 1, 1, 1, 0.7, : missing values in object I need to know where I am going wrong. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nested anova and multiple comparisons
Hi Stephen, On Sat, Apr 26, 2008 at 10:29 AM, Stephen Cole [EMAIL PROTECTED] wrote: ...snip... I have managed to accomplish my first two goals by analyzing the data as a 3 level nested Anova with the following code mod1 - aov(Count~Region/Location/Site, data=data) This allows me to get the MS for a Anova table. However R does not compute the correct F-statistics (because it uses the residual MS for the denominator in all calculations which it shouldnt) so i manually computed the F-stats and variance components by hand. R's just doing what it was asked it to do. The fact that it used the residual MS isn't a surprise given your code. From reading the help guide i learned about and tried using the Error(Region/Location/Site) command but then i can only get MS and no F-statistics and still hand to compute the rest by hand. Yes, if you want to use Method-of-Moments estimation to solve for expected mean squares for the variance components and calculate F-stats w/ the 'correct' MS, then then specifying error strata is reasonable for balanced data (assuming you've met model assumptions). However, I believe most statisticians these days are more inclined to use (Restricted) Maximum Likelihood estimation (e.g. using lme or lmer) because of the added fliexibility it provides (also it won't produce negative variance estimates when the mean squared error is larger than the mean square between the groups of interest) As for why you are only getting MS and no F-stats, it's hard to say without a reproducible example (see the posting guide). In my experience this will occur when there are insufficient degrees of freedom for calculating an F-stat at a given level. My problem now is that i would liek to use TukeyHSD for multiple comparisons. Howeber since R is unable to compute the correct F statistics in this nested design i also think it is using the wrong MS and df in calculating tukeys q. for example when i use Again, it's not that R is unable to, it's that you asked R not to. TukeyHSD(mod1, Region) i will get values however i do not think they have been calculated correctly. Furthermore when i use the Error(Region/Location/Site) term i can then no longer use TukeyHSD as i get the error message that there is no applicable use of tukey on this object. Looking at the methods available for TukeyHSD shows that it will work for an object of class aov methods(TukeyHSD) [1] TukeyHSD.aov But ?aov states: Value: An object of class 'c(aov, lm)' or for multiple responses of class 'c(maov, aov, mlm, lm)' or for multiple error strata of class 'aovlist'. There are 'print' and 'summary' methods available for these. So your model with error strata is of class aovlist not aov. i am just wondering if there is any way to use Multiple comparisons with R in a nested design like mine. I'm not sure but the package 'multcomp' might be of help. I have thought about using lme or lmer but if i understand them right with a balanced design i should be able to get the same result using aov Even with balanced data, you don't get the exact same thing with aov. As mentioned above lme and lmer use different estimation methods. One of the advantages of using (RE)ML is a lot more flexibility in how you specifiy the structure of random effects covariance matrices, and (at least with lme) you have a lot of flexibility in how you structure the residual covariance matrix as well. This is particularly useful when you are not meeting assumptions of constant variance and/or independence of observations (which seems to be the rule rather than the exception with ecological data with a spatial component, such as you appear to have). The theory behind analyses you want to do are quite complex with many potential pitfalls. If at all possible, I highly recommend finding the help of someone who is an expert in these types of models (known as mixed, hierarchical, multilevel, random effects, variance components, nested ANOVA, etc). best, Kingsford Jones Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA models in R
Here is a short example: dat - data.frame(x = rnorm(20), a = rep(letters[1:4], 5), b = rep(letters[1:5], each = 4)) summary(aov(x ~ a*b, dat)) Df Sum Sq Mean Sq a3 0.8021 0.2674 b4 3.7175 0.9294 a:b 12 10.5416 0.8785 summary(aov(x ~ a/b, dat)) Df Sum Sq Mean Sq a3 0.8021 0.2674 a:b 16 14.2590 0.8912 So in your nested case you should not get a mean square for 'Female' at all. The interaction sum of squares in the nested case is the sum of the main effect and interaction in the crossed model case, (as are the degrees of freedom). Although you think of them as different models, in a mathematical sense they are equivalent - you just parcel the degrees of freedom and SSQ a bit differently in the sequential anova. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Bolnick Sent: Wednesday, 6 February 2008 2:28 PM To: r-help@r-project.org Subject: [R] Nested ANOVA models in R Hi, I'm trying to work through a Nested ANOVA for the following scenario: 20 males were used to fertilize eggs of 4 females per male, so that female is nested within male (80 females used total). Spine length was measured on 11 offspring per family, resulting in 880 measurements on 80 families. I used the following two commands: summary(aov(Spinelength ~ Male*Female)) and summary(aov(Spinelength ~ Male/Female)) I get the same mean squares either way, which doesn't seem right to me. In the former case, the mean square for females should be calculated around the overall mean across all females, whereas the mean square in the latter case should be calculated using deviations from the set of 4 females nested within a given male, right? Of course, it is more appropriate for me to treat each of these as random effects. I know Bates has objections to the SAS-style partitioning variances to obtain F statistics and p-values, and I have read relevant parts of Pinhero and Bates, but how can a specify a nested random effects model that yields p-values for both the males (tested against MS for females) and females nested within males? Thanks, Dan Bolnick Section of Integrative Biology University of Texas at Austin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.