Dear Gavin,

On Mon, Mar 11, 2013 at 1:58 PM, Gavin Simpson <gavin.simp...@ucl.ac.uk> wrote:
> Hi Jay,
>
> What you describe is similar to research conducted at UCL by myself and
> colleagues and also as part of an EU project that finished a few years
> ago now.
>
> Since then, my thoughts on this have expanded a little and Sarah's point
> about path analysis was where I would have gone next if I was continuing
> this line of investigation.
>
> In the work I did at UCL, I went with a time series approach. I
> decomposed the species data into a set of ordination axes (I used PCA,
> but on Hellinger transformed data to account for non-linear responses in
> the species which PCA doesn't handle well). Then I fitted an additive
> model with, say, PCA axis 1 (PC1) as the response variable and one or
> more covariates entering as smooth functions as the predictors. The nice
> things about the additive model is that the terms come together
> additively and the non-linear effects allow the effect of a variable to
> change (i.e. cold temperatures only induce an "effect" on the response).
> As these were time series data I controlled for autocorrelation by using
> a continuous time AR(1) process for the residuals.
>
> Here are some refs on these approaches:
>
> http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html
> http://dx.doi.org/10.1111/j.1365-2427.2012.02860.x
> http://dx.doi.org/10.1111/j.1365-2427.2011.02651.x
> http://dx.doi.org/10.1111/j.1365-2427.2011.02670.x
>
> and there is more in the Special Issue section of FWBiol:
>
> http://onlinelibrary.wiley.com/doi/10.1111/fwb.2012.57.issue-10/issuetoc
>
> Now this doesn't look at cascades of effects; one key aspect of all of
> the above was the construction of appropriate time series for the
> environmental data. Ideally, I'd take the variable that is most closely
> related to diatom physiology as my predictor. However those variables do
> not always exist or are not available. Instead surrogates can be used;
> amount of agricultural land-use in catchments or fertilizer-use
> historical records will be highly correlated with nutrient loading from
> a catchment to the lake/reservoir, so these could be used instead. Of
> course, you do need to wary of spurious correlations with time series
> data.
>
> As these models are using smooth functions, you are only going to be
> able to include one or a few covariates unless you have *lots* of
> samples; and anyway, I would always advise to think first from the
> biological or ecological viewpoint and formulate an hypothesis there and
> then fit that with the stats rather than throwing lots of variables into
> an analysis to see what pops out (which seems to be what a lot of
> palaeoecologists do!)
>
> As regards envfit(), it isn't symmetric in the variables at least as far
> as I see it; it fits a model of
>
> varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon
>
> in other words it uses the 2d "axis" scores (PC1 and PC2, or nMDS1 +
> nMDS2 coordinates) to predict the values of the response using a linear
> model. As each environmental variable is modelled separately
> (individually), one is not favouring a set of variables etc. Perhaps
> this is not what you meant but worth pointing out.
>

Yes, you are right, that is not what I meant, and you've said it
better than I did (or knew how to).


> Also, envfit presumes a linear relationship between the variables and
> the ordination coordinates. If that is too strong an assumption, see
> ordisurf() which fits GAM-based surfaces rather than linear-regression
> surfaces.
>

Yes, there is evidence of nonlinearity in our data and we've done work
with ordisurf, too.


> A fairly standard way of looking at this sort of data might be to group
> variables that are related and then decompose the variance in the data
> in that which can be explained by each group of variables uniquely, that
> which can be explained by two or more groups, and the unexplained
> variance. The vegan package has a function varpart() which can do this
> all for you if you are willing to use RDA to analyse the data (unbiased
> estimates of the variance explained are not available for CCA and nMDS
> is not a constrained technique) - note you can use principal coordinates
> analysis to embed your original dissimilarity matrix into a metric space
> and then take the PCoA axis scores as the input "data" for the RDA so
> that the RDA is in the dissimilarity data of your choice and not linear
> in the original data.
>
> Steve Juggins has adapted the hierarchical partitioning approach from
> package hier.part to the multivariate multiple regression setting of RDA
> (possibly CCA too?) which is related to but somewhat different to the
> variance partitioning described above. I don't believe Steve has
> released this code yet, so if interested I'd emailing him for it; he is
> the author of the rioja package so contact details can be found on CRAN.
>
> Neither variance partitioning or hierarchical partitioning directly do
> exactly what you ask and model the directed dependence or pathways of
> effects. They are however far simpler methods which would have
> familiarity within the applied community that will see these
> results/papers etc.
>
> In writing this I have pondered on whether you and/or the ecologists are
> making it too complex? As you have all the variables of interest, I
> might model the variables that physiologically affect the diatoms and
> their effect on diatom composition (using constrained ordination or the
> additive model time series approach I used). Then I would model the
> relationship between the higher-level factor (land-use, etc) that I
> hypothesise to be driving changes in the lower-level,
> physiologically-relevant, variable.
>

That's interesting - I'll ask him about that.

> Whilst path analysis might be ideal tool here, I suspect you'll have
> plenty of complications to address, not least the compositional aspects
> (though Sarah points you to some ideas there), but also the temporal
> autocorrelation prevalent in the palaeo data which would need to be
> accounted for to yield appropriate p-values. If you do address these
> issues and use path analysis, I for one would be very interested to here
> how you got on as this would be a very useful contribution to the
> palaeolimnological literature!
>
> Wow, this got long! Anyway, hopefully some of the above will be of use
> and interest, and should you wish to chat about this off-list some more
> (I'm certainly very interested if you use path analysis for this), then
> please do so. (Note I am still sending via my old UCL address as I have
> moved to Canada and U Regina, contact details in my email signature.)
>
> All the best,
>
> Gavin
>
> On Wed, 2013-03-06 at 22:12 -0500, Jay Kerns wrote:


This reply is spectacular; thank you very much.  There's a lot to
think about, and I'll be chewing on it for a while.  Thank you for the
references and thoughtful comments.

Regards,
Jay

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to