Dear Thierry, Chris, Peter, and the list, Based on these suggestions and some others, I realize now that tree id is clearly the grouping variable. A couple people had mentioned in a previous letter that there is likely to be temporal autocorrelation within trees, and I am particularly interested in this aspect of the data, which prompted me to look into the autocorrelation structures described in Pinheiro and Bates.
I am still a little confused about the two different approaches for accounting for non-independence within trees among years (i.e. specifying tree.id as a random effect vs. specifying autocorrelation in the residuals. I am currently considering these alternative models: f.gls <- gls(growth ~ diameter*vines, corr = corAR1(form = ~ yr|tree.id) f.lme <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id) f.lme.ar1 <- lme(fixed = growth ~ diameter*vines, random = ~ 1 | tree.id, correlation = corAR1(form = ~ yr | tree.id)) For f.gls, phi is 0.67, which pretty closely matches the value I determined empirically by correlating residuals from different years. plot(ACF(f.gls)) also reveals highly signficant autocorrelation which decays smoothly with lag. This approach is very appealing, because it closely matches what I see if I make a correlation matrix among years empirically from the residuals. However, f.lme.ar1 also has significant correlation, and a lower AIC than f.gls (and f.lme). But now phi is only 0.26, and plot(ACF(f.lme.ar1)) does not look like AR1, but is rather increasingly negative with longer lags. I don't quite understand this change in phi, or the biological interpretation of the correlation structure. So, even though f.gls has a higher AIC, I'd like to use that one because it seems more biologically interpretable. Does this adequately account for the pseudoreplication across years, or do I really need that random effect of tree.id in there? Again, many many thanks for all your help. This makes a lot more sense to me now than it did. Matt >-----Original Message----- >From: ONKELINX, Thierry [mailto:[EMAIL PROTECTED] >Sent: Thursday, May 22, 2008 11:15 AM >To: Christian A. Parker; Landis, R Matthew >Cc: r-sig-ecology@r-project.org >Subject: RE: [R-sig-eco] nlme model specification > >Matthew, > >I support Chris' point of view. The random effects should model the >dependence structure between the measurements. The parameters of your >fixed effect will be better when you have specified the random effect >correctly. Define the most complex model that are willing to >accept, run >it with different random effects and compare those models (by LRT or >AIC, both under REML). Take the best one and refine your fixed effects >(under ML). > >A few suggestions: > >M0 <- gls(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, method = 'REML') >Ma <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|year, >method = 'REML') >M1 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = ~1|id, >method = 'REML') >M2 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = >~year|id, >method = 'REML') >M3 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = >~factor(year)|id, method = 'REML') >M4 <- lme(fixed = growth ~ (x1 + x2 + x3+ x4 + x5)^2, random = >list(id = >pdCompSymm(~factor(year))), method = 'REML') > >anova(M0, Ma) >anova(M0, M1, M2) >anova(M0, M1, M3) >anova(M0, M1, M4) > >Warning: M3 may take some time to calculate as it will have estimate 45 >parameters for the random effect. > >A random slope by year when you group by year makes no sense since you >have in each group only data from one year. Hence it is impossible to >have a slope by year in each group. > >Thierry > > >--------------------------------------------------------------- >--------- >---- >ir. Thierry Onkelinx >Instituut voor natuur- en bosonderzoek / Research Institute for Nature >and Forest >Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, >methodology and quality assurance >Gaverstraat 4 >9500 Geraardsbergen >Belgium >tel. + 32 54/436 185 >[EMAIL PROTECTED] >www.inbo.be > >To call in the statistician after the experiment is done may be no more >than asking him to perform a post-mortem examination: he may be able to >say what the experiment died of. >~ Sir Ronald Aylmer Fisher > >The plural of anecdote is not data. >~ Roger Brinner > >The combination of some data and an aching desire for an >answer does not >ensure that a reasonable answer can be extracted from a given body of >data. >~ John Tukey > >-----Oorspronkelijk bericht----- >Van: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] Namens Christian A. Parker >Verzonden: donderdag 22 mei 2008 16:39 >Aan: Landis, R Matthew >CC: 'r-sig-ecology@r-project.org' >Onderwerp: Re: [R-sig-eco] nlme model specification > >Matthew >Please correct me if I am wrong (anyone) but because your observations >are not independent across your desired groups (years) your error terms >will be biased which will then influence your significant tests. So >regardless of the factor that you are interested you would >still want to > >account for the fact that all measurements were taken on the same trees >each year by doing a repeated measures model of some sort. >Hope this helps, >-Chris > >Landis, R Matthew wrote: >> Dear R-sig-eco: >> >> Many thanks to all of those who took the time to reply to my >question. >The diversity of replies has made me go back and try to clarify my >question. Apologies for the length of the e-mail. Thanks in >advance to >anyone willing to plow through this and understand it. If you're ever >in Middlebury I'll buy you a beer. >> >> To repeat, I have 300 trees, ranging in size from 10 - 150 >cm diameter >(big trees). To simplify my original question, let's say I want to >understand the relationship between growth and two variables, diameter >(continuous) and vine load (ordinal index from 1-4). I'd also like to >know the relative importance of diameter vs. vine load, e.g. by partial >R2. If I had one year of data, this would be a simple regression. >> >> However, I have 9 years of annual measurements on the trees. It's as >if I have the above analysis repeated 9 times. There was no initial >treatment, so I view these 9 years as a random sample of the years in >the life of the tree, and unlike most examples of repeated measures I >have read, the time effect is of no interest whatsoever. That is, I am >not interested in viewing xyplot(growth ~ time|id). I don't expect to >see any consistent directional response to time. In a way, it's as if >the 9 years represent blocks, (except that it's the same 300 trees in >each block) -- this is why I view the yr as a random effect, and as the >grouping variable. >> >> If I were to graph the data, I would use xyplot(growth ~ diameter|yr) >to see what I am most interested in. Grouping by individual doesn't >make sense to me here because each individual only represents a very >small slice of the full range of measurements - e.g. over the >ten years, >each tree only grows from 10 cm - 14 cm, so I can't really estimate the >growth vs. diameter relationship for each tree. xyplot(growth ~ >diameter|id) would not be useful. This is why I don't consider the >individual to be the grouping variable, but perhaps I am wrong on this. >> >> So, now, as before, I am back to >> >> fit <- lme(fixed = growth ~ diameter * vines, random = ~ 1|year) >> >> I'm expecting that this will estimate separate intercepts for each >year. Which is what I want (I would like to fit separate >slopes by year >too, but that model didn't converge). >> >> I guess what I'm most concerned about is whether the significance >tests obtained for each term use the appropriate error term and the >appropriate degrees of freedom. I'm currently using something like the >following command to test the effect of diameter >> >> anova(fit.full.model, update(fit.full.model, . ~ vines)) >> >> But maybe I'm way off base there. >> >> Thanks very much! >> >> Matt Landis > _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology