Philip,

I agree that the simpler analysis has it's merits. In fact, this is where we 
started our analysis, and these are the results we present in the current draft 
of the paper.  It is reassuring to know that they are both statistically valid 
and conservative; we could leave the analysis there. However, I think this form 
of analysis is limited (as discussed by Zuur pg.105, and others) and we have 
had problems publishing results such as this before.

I believe is better to model the data directly rather than a regression 
parameter, as given our current computer power, there isn't anything stopping 
us except our own statistical knowledge and understanding.  But more 
importantly grouping the data into, say, an seasonal means throws away 
information that could be significant in our analysis.  That is, our mixed 
effects model would have no way of knowing whether we are using a single 
temperature reading taken on a single date, or whether our datapoint is the 
result of many readings (in our case 92 days x 15 minute readings = 8832 
readings per plot).  Surely the latter is a more reliable measure of the summer 
temperature?  How can we capture this extra certainty in our final model?  Our 
current dataset has a strong enough signal-to-noise ratio to get a result this 
way, but we will extend this analysis over several years and for other 
parameters that may not have such a clear signal.  Also, this is a longstanding 
problem in ou!
 r temperature data, and previous datasets have had very poor signal-to-noise 
ratios indeed; we need to extract every bit of information from those as our 
current (conservative) model estimates are not significant.

Also, if we work with modelled regression parameters we cannot pool _all_ our 
information into a single analysis; that is, we cannot account for everything 
at once, allowing each parameter to interact with the others in a continuous 
way.  We can compute mean temperature per plot, then from that model treatment 
effect (end perhaps a plot random effect) but then we can't speak about overall 
variance.  We could compute variance in each plot, but then we cannot speak 
about mean effect or plot effect (and what is the mean of a variance anyway?).  
We could subtract the mean effect off first, but then we are merely walking 
down the path of doing the analysis we _really_ want the hard way.  I also 
believe we get onto shifty ground as to whether these parameters might 
interact, or pull significance from each other.  I don't profess to understand 
the subtlety of the math enough to determine this a priori.

Another limitation of analysis of regression parameters is that we have not 
accounted for the autocorrelation in the time-series data. If we look at the 
response residuals (raw residuals, i.e. the ones that do no remove the AR1 
signal) from the mod.gls model, we get fairly large variances, and each plot 
mean looks approximately the same -- these are the results we get from a simple 
computation of mean or sd on each time-series. The problem is seasonal 
fluctuation (which is random, and autocorrelated) is overwhelming the sd 
calculation and masking the actual plot level response. That is, the variance 
over the course of the season is bigger than the plot variance for any given 
time snapshot.  We could model the seasonal fluctuation using a gam (I have 
done this) but a) it will be different for each year and b) it really isn't a 
parameter of interest anyway.  That is, I am not trying to predict what the 
temperature in a plot will be given treatment _and_ time of year, just trea!
 tment because each year is going to be different.  (For the record, there is 
no apparent long-term seasonal effect linear or otherwise.) Hence we DO need to 
consider auto-correlation effect because even simple computations such as mean 
and variance assume independence!  In this case an AR1 model (or the ARMA(2,0) 
model I have running) leaves cleaner residuals than the gam, as well as being 
more appropriate in a philosophical sense.

Regarding your discussion of variance structure.  I agree that there is a mixed 
model that includes both AR1 and compound symmetric components (a fully ME 
model with intercept and slope terms). I need to spend more time thinking about 
this as my intuition had proposed a model that looked more like a product than 
a sum; however, I'm thinking you might be right that the covariance structure 
is additive.  Zuur talks about the implied correlation structure from a full 
random model (pg.113, but it is not in an easy to picture form).  He does not 
discuss a slope only model, but when I look at his example and pump my own math 
through, I do not see it decreasing with "distance" between readings, i.e. 
there is no covariance term on the readings.

So, I'm not sure how to model this.  Do you have suggestions?  Currently the 
two stage analysis I present (final code block, discussion question 3) comes 
closest to the final product I would like, and may in the end be valid.  Do you 
have thoughts on the validity here?  Functionally it is modelling the treatment 
effect and AR1 process first, removing those, then computing the plot effect 
afterwards.

Thanks for your help,
Jonathan

________________________________________
From: Dixon, Philip M [STAT] [pdi...@iastate.edu]
Sent: July 16, 2013 8:34 AM
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Autocorrelation structures and simplicity of analysis

Jonathan,

The AR(1) + compound symmetry model does make sense.  It is a model where the 
correlation over time declines with the separation between times points, but it 
does not decline to zero.  There is a constant non-zero (at least potentially) 
correlation arising from the compound symmetry (constant correlation between 
all pairs of observations on the same plot) part of the model.  I have found it 
fits many data sets better than either AR(1) only or CS only.

However, I suggest you completely rethink the analysis.  I am a great fan of 
the approach recommended by Paul Murtaugh a few years ago.  The paper is 
"Simplicity and complexity in ecological data analysis" Author(s): Murtaugh, 
Paul A.  Source: ECOLOGY  Volume: 88   Issue: 1   Pages: 56-62   Published: JAN 
2007

Your situation is almost identical to one he considers in his paper.  His 
advice: remember that you have experimental units randomly assigned to warming 
treatments.  Your analysis currently ignores time, except for accounting for 
the autocorrelation.  You seem to care only about the treatment effect.  If 
that's the case, you don't need to worry about days.  Just focus the analysis 
on plots and the treatment effect.  Specifically, figure out an informative 
summary statistic for each plot (e.g., average over time, median over time), 
calculate those 12 numbers, then analyze those.  No need to worry about the 
autocorrelation structure then!

The beauty of this approach is that if you want to ask a different question, 
e.g. about day-day variability instead of the average, you just change your 
summary statistic to something that measures variability, e.g. log sd.

Best wishes,
Philip Dixon

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to