Dear Gavin, On Mon, Mar 11, 2013 at 1:58 PM, Gavin Simpson <gavin.simp...@ucl.ac.uk> wrote: > Hi Jay, > > What you describe is similar to research conducted at UCL by myself and > colleagues and also as part of an EU project that finished a few years > ago now. > > Since then, my thoughts on this have expanded a little and Sarah's point > about path analysis was where I would have gone next if I was continuing > this line of investigation. > > In the work I did at UCL, I went with a time series approach. I > decomposed the species data into a set of ordination axes (I used PCA, > but on Hellinger transformed data to account for non-linear responses in > the species which PCA doesn't handle well). Then I fitted an additive > model with, say, PCA axis 1 (PC1) as the response variable and one or > more covariates entering as smooth functions as the predictors. The nice > things about the additive model is that the terms come together > additively and the non-linear effects allow the effect of a variable to > change (i.e. cold temperatures only induce an "effect" on the response). > As these were time series data I controlled for autocorrelation by using > a continuous time AR(1) process for the residuals. > > Here are some refs on these approaches: > > http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html > http://dx.doi.org/10.1111/j.1365-2427.2012.02860.x > http://dx.doi.org/10.1111/j.1365-2427.2011.02651.x > http://dx.doi.org/10.1111/j.1365-2427.2011.02670.x > > and there is more in the Special Issue section of FWBiol: > > http://onlinelibrary.wiley.com/doi/10.1111/fwb.2012.57.issue-10/issuetoc > > Now this doesn't look at cascades of effects; one key aspect of all of > the above was the construction of appropriate time series for the > environmental data. Ideally, I'd take the variable that is most closely > related to diatom physiology as my predictor. However those variables do > not always exist or are not available. Instead surrogates can be used; > amount of agricultural land-use in catchments or fertilizer-use > historical records will be highly correlated with nutrient loading from > a catchment to the lake/reservoir, so these could be used instead. Of > course, you do need to wary of spurious correlations with time series > data. > > As these models are using smooth functions, you are only going to be > able to include one or a few covariates unless you have *lots* of > samples; and anyway, I would always advise to think first from the > biological or ecological viewpoint and formulate an hypothesis there and > then fit that with the stats rather than throwing lots of variables into > an analysis to see what pops out (which seems to be what a lot of > palaeoecologists do!) > > As regards envfit(), it isn't symmetric in the variables at least as far > as I see it; it fits a model of > > varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon > > in other words it uses the 2d "axis" scores (PC1 and PC2, or nMDS1 + > nMDS2 coordinates) to predict the values of the response using a linear > model. As each environmental variable is modelled separately > (individually), one is not favouring a set of variables etc. Perhaps > this is not what you meant but worth pointing out. >
Yes, you are right, that is not what I meant, and you've said it better than I did (or knew how to). > Also, envfit presumes a linear relationship between the variables and > the ordination coordinates. If that is too strong an assumption, see > ordisurf() which fits GAM-based surfaces rather than linear-regression > surfaces. > Yes, there is evidence of nonlinearity in our data and we've done work with ordisurf, too. > A fairly standard way of looking at this sort of data might be to group > variables that are related and then decompose the variance in the data > in that which can be explained by each group of variables uniquely, that > which can be explained by two or more groups, and the unexplained > variance. The vegan package has a function varpart() which can do this > all for you if you are willing to use RDA to analyse the data (unbiased > estimates of the variance explained are not available for CCA and nMDS > is not a constrained technique) - note you can use principal coordinates > analysis to embed your original dissimilarity matrix into a metric space > and then take the PCoA axis scores as the input "data" for the RDA so > that the RDA is in the dissimilarity data of your choice and not linear > in the original data. > > Steve Juggins has adapted the hierarchical partitioning approach from > package hier.part to the multivariate multiple regression setting of RDA > (possibly CCA too?) which is related to but somewhat different to the > variance partitioning described above. I don't believe Steve has > released this code yet, so if interested I'd emailing him for it; he is > the author of the rioja package so contact details can be found on CRAN. > > Neither variance partitioning or hierarchical partitioning directly do > exactly what you ask and model the directed dependence or pathways of > effects. They are however far simpler methods which would have > familiarity within the applied community that will see these > results/papers etc. > > In writing this I have pondered on whether you and/or the ecologists are > making it too complex? As you have all the variables of interest, I > might model the variables that physiologically affect the diatoms and > their effect on diatom composition (using constrained ordination or the > additive model time series approach I used). Then I would model the > relationship between the higher-level factor (land-use, etc) that I > hypothesise to be driving changes in the lower-level, > physiologically-relevant, variable. > That's interesting - I'll ask him about that. > Whilst path analysis might be ideal tool here, I suspect you'll have > plenty of complications to address, not least the compositional aspects > (though Sarah points you to some ideas there), but also the temporal > autocorrelation prevalent in the palaeo data which would need to be > accounted for to yield appropriate p-values. If you do address these > issues and use path analysis, I for one would be very interested to here > how you got on as this would be a very useful contribution to the > palaeolimnological literature! > > Wow, this got long! Anyway, hopefully some of the above will be of use > and interest, and should you wish to chat about this off-list some more > (I'm certainly very interested if you use path analysis for this), then > please do so. (Note I am still sending via my old UCL address as I have > moved to Canada and U Regina, contact details in my email signature.) > > All the best, > > Gavin > > On Wed, 2013-03-06 at 22:12 -0500, Jay Kerns wrote: This reply is spectacular; thank you very much. There's a lot to think about, and I'll be chewing on it for a while. Thank you for the references and thoughtful comments. Regards, Jay _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology