Re: [R-sig-eco] quantifying directed dependence of environmental factors
On Thu, 2013-03-07 at 11:13 -0800, Rich Shepard wrote: On Thu, 7 Mar 2013, Philippi, Tom wrote: I would look at packages bio.infer, paltran, fossil, and analogue, and search to see if anyone has pushed them in the direction you want to go. To this list I would add Steve Juggins' excellent rioja package. In addition to several WA methods it also includes maximum likelihood regression and calibration in the flavour of bio.infer. bio.infer is based on the EPA's EMAP-West (Environmental Monitoring and Analysis Program for the western states) and uses benthic macroinvertebrates and fish with selected water chemistry parameters. It uses the ITIS (International Taxonomic Identification System) to provide consistency in naming taxa to the lowest reasonable level. As far as I can tell, bio.infer contains all you say but as higher-level utility functions. However, IIRC at the heart of bio.infer is what we call maximum likelihood regression and calibration; fit a Gaussian logistic regression to each species to characterise species-env relationships, then invert this set of models to find the value of the environmental variable that maximises the likelihood of observing a sample of new counts over the set of species. Invariably, the inversion involves numerical optimisation to search for the value of the env that made the new counts most likely. You just need to give mlsolve() the relevant data objects, which seem to be somewhat easy to create by hand if you don't need to look-up harmonised or correct taxon names. You really don't need all the nice ITIS hand-holding, though I'm sure it is very handy for those working on relevant species groups. G Conceptually, one could assemble equivalent dataframes for diatom taxa and environmental conditions, but I don't know if ITIS has plants/algae in the system; problably does. However, the biota-environments relationships would be based on current conditions and whether this would be valid for sediment core data would need to be judged by a limnologist, not a stream ecologist like me. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] quantifying directed dependence of environmental factors
Hi Jay, What you describe is similar to research conducted at UCL by myself and colleagues and also as part of an EU project that finished a few years ago now. Since then, my thoughts on this have expanded a little and Sarah's point about path analysis was where I would have gone next if I was continuing this line of investigation. In the work I did at UCL, I went with a time series approach. I decomposed the species data into a set of ordination axes (I used PCA, but on Hellinger transformed data to account for non-linear responses in the species which PCA doesn't handle well). Then I fitted an additive model with, say, PCA axis 1 (PC1) as the response variable and one or more covariates entering as smooth functions as the predictors. The nice things about the additive model is that the terms come together additively and the non-linear effects allow the effect of a variable to change (i.e. cold temperatures only induce an effect on the response). As these were time series data I controlled for autocorrelation by using a continuous time AR(1) process for the residuals. Here are some refs on these approaches: http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html http://dx.doi.org/10./j.1365-2427.2012.02860.x http://dx.doi.org/10./j.1365-2427.2011.02651.x http://dx.doi.org/10./j.1365-2427.2011.02670.x and there is more in the Special Issue section of FWBiol: http://onlinelibrary.wiley.com/doi/10./fwb.2012.57.issue-10/issuetoc Now this doesn't look at cascades of effects; one key aspect of all of the above was the construction of appropriate time series for the environmental data. Ideally, I'd take the variable that is most closely related to diatom physiology as my predictor. However those variables do not always exist or are not available. Instead surrogates can be used; amount of agricultural land-use in catchments or fertilizer-use historical records will be highly correlated with nutrient loading from a catchment to the lake/reservoir, so these could be used instead. Of course, you do need to wary of spurious correlations with time series data. As these models are using smooth functions, you are only going to be able to include one or a few covariates unless you have *lots* of samples; and anyway, I would always advise to think first from the biological or ecological viewpoint and formulate an hypothesis there and then fit that with the stats rather than throwing lots of variables into an analysis to see what pops out (which seems to be what a lot of palaeoecologists do!) As regards envfit(), it isn't symmetric in the variables at least as far as I see it; it fits a model of varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon in other words it uses the 2d axis scores (PC1 and PC2, or nMDS1 + nMDS2 coordinates) to predict the values of the response using a linear model. As each environmental variable is modelled separately (individually), one is not favouring a set of variables etc. Perhaps this is not what you meant but worth pointing out. Also, envfit presumes a linear relationship between the variables and the ordination coordinates. If that is too strong an assumption, see ordisurf() which fits GAM-based surfaces rather than linear-regression surfaces. A fairly standard way of looking at this sort of data might be to group variables that are related and then decompose the variance in the data in that which can be explained by each group of variables uniquely, that which can be explained by two or more groups, and the unexplained variance. The vegan package has a function varpart() which can do this all for you if you are willing to use RDA to analyse the data (unbiased estimates of the variance explained are not available for CCA and nMDS is not a constrained technique) - note you can use principal coordinates analysis to embed your original dissimilarity matrix into a metric space and then take the PCoA axis scores as the input data for the RDA so that the RDA is in the dissimilarity data of your choice and not linear in the original data. Steve Juggins has adapted the hierarchical partitioning approach from package hier.part to the multivariate multiple regression setting of RDA (possibly CCA too?) which is related to but somewhat different to the variance partitioning described above. I don't believe Steve has released this code yet, so if interested I'd emailing him for it; he is the author of the rioja package so contact details can be found on CRAN. Neither variance partitioning or hierarchical partitioning directly do exactly what you ask and model the directed dependence or pathways of effects. They are however far simpler methods which would have familiarity within the applied community that will see these results/papers etc. In writing this I have pondered on whether you and/or the ecologists are making it too complex? As you have all the variables of interest, I might model the variables that physiologically affect the diatoms and their effect on
Re: [R-sig-eco] quantifying directed dependence of environmental factors
Dear Gavin, On Mon, Mar 11, 2013 at 1:58 PM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: Hi Jay, What you describe is similar to research conducted at UCL by myself and colleagues and also as part of an EU project that finished a few years ago now. Since then, my thoughts on this have expanded a little and Sarah's point about path analysis was where I would have gone next if I was continuing this line of investigation. In the work I did at UCL, I went with a time series approach. I decomposed the species data into a set of ordination axes (I used PCA, but on Hellinger transformed data to account for non-linear responses in the species which PCA doesn't handle well). Then I fitted an additive model with, say, PCA axis 1 (PC1) as the response variable and one or more covariates entering as smooth functions as the predictors. The nice things about the additive model is that the terms come together additively and the non-linear effects allow the effect of a variable to change (i.e. cold temperatures only induce an effect on the response). As these were time series data I controlled for autocorrelation by using a continuous time AR(1) process for the residuals. Here are some refs on these approaches: http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html http://dx.doi.org/10./j.1365-2427.2012.02860.x http://dx.doi.org/10./j.1365-2427.2011.02651.x http://dx.doi.org/10./j.1365-2427.2011.02670.x and there is more in the Special Issue section of FWBiol: http://onlinelibrary.wiley.com/doi/10./fwb.2012.57.issue-10/issuetoc Now this doesn't look at cascades of effects; one key aspect of all of the above was the construction of appropriate time series for the environmental data. Ideally, I'd take the variable that is most closely related to diatom physiology as my predictor. However those variables do not always exist or are not available. Instead surrogates can be used; amount of agricultural land-use in catchments or fertilizer-use historical records will be highly correlated with nutrient loading from a catchment to the lake/reservoir, so these could be used instead. Of course, you do need to wary of spurious correlations with time series data. As these models are using smooth functions, you are only going to be able to include one or a few covariates unless you have *lots* of samples; and anyway, I would always advise to think first from the biological or ecological viewpoint and formulate an hypothesis there and then fit that with the stats rather than throwing lots of variables into an analysis to see what pops out (which seems to be what a lot of palaeoecologists do!) As regards envfit(), it isn't symmetric in the variables at least as far as I see it; it fits a model of varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon in other words it uses the 2d axis scores (PC1 and PC2, or nMDS1 + nMDS2 coordinates) to predict the values of the response using a linear model. As each environmental variable is modelled separately (individually), one is not favouring a set of variables etc. Perhaps this is not what you meant but worth pointing out. Yes, you are right, that is not what I meant, and you've said it better than I did (or knew how to). Also, envfit presumes a linear relationship between the variables and the ordination coordinates. If that is too strong an assumption, see ordisurf() which fits GAM-based surfaces rather than linear-regression surfaces. Yes, there is evidence of nonlinearity in our data and we've done work with ordisurf, too. A fairly standard way of looking at this sort of data might be to group variables that are related and then decompose the variance in the data in that which can be explained by each group of variables uniquely, that which can be explained by two or more groups, and the unexplained variance. The vegan package has a function varpart() which can do this all for you if you are willing to use RDA to analyse the data (unbiased estimates of the variance explained are not available for CCA and nMDS is not a constrained technique) - note you can use principal coordinates analysis to embed your original dissimilarity matrix into a metric space and then take the PCoA axis scores as the input data for the RDA so that the RDA is in the dissimilarity data of your choice and not linear in the original data. Steve Juggins has adapted the hierarchical partitioning approach from package hier.part to the multivariate multiple regression setting of RDA (possibly CCA too?) which is related to but somewhat different to the variance partitioning described above. I don't believe Steve has released this code yet, so if interested I'd emailing him for it; he is the author of the rioja package so contact details can be found on CRAN. Neither variance partitioning or hierarchical partitioning directly do exactly what you ask and model the directed dependence or pathways of effects.
Re: [R-sig-eco] quantifying directed dependence of environmental factors
On Thu, Mar 7, 2013 at 9:20 PM, Sarah Goslee sarah.gos...@gmail.com wrote: ... There's a fair bit of literature on Mantel-based path analysis, and other similar dissimilarity-based approaches. SEM can be used with composition as well, although not (I think) with the intermediate step of calculating dissimilarities. Besides journal articles employing those techniques, I like both of these: J. B. Grace, Structural Equation Modeling and Natural Systems, Cambridge University Press, Cambridge, UK, 2006. B. Shipley, Cause and Correlation in Biology: A User’s Guide to Path Analysis, Structural Equations and Causal Inference, Cambridge University Press, Cambridge, UK, 2000. I recently stumbled on a great book on path modelling using PLS (with R) that is freely downloadable at http://is.gd/BxqIEL Ivailo -- UBUNTU: a person is a person through other persons. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] quantifying directed dependence of environmental factors
On Thu, Mar 7, 2013 at 2:13 PM, Rich Shepard rshep...@appl-ecosys.com wrote: On Thu, 7 Mar 2013, Philippi, Tom wrote: I would look at packages bio.infer, paltran, fossil, and analogue, and search to see if anyone has pushed them in the direction you want to go. bio.infer is based on the EPA's EMAP-West (Environmental Monitoring and Analysis Program for the western states) and uses benthic macroinvertebrates and fish with selected water chemistry parameters. It uses the ITIS (International Taxonomic Identification System) to provide consistency in naming taxa to the lowest reasonable level. Conceptually, one could assemble equivalent dataframes for diatom taxa and environmental conditions, but I don't know if ITIS has plants/algae in the system; problably does. However, the biota-environments relationships would be based on current conditions and whether this would be valid for sediment core data would need to be judged by a limnologist, not a stream ecologist like me. Rich Rich: thank you. I am not sure about the ITIS, but maybe my colleague does. I will ask him. Thanks again. Jay ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] quantifying directed dependence of environmental factors
I recently stumbled on a great book on path modelling using PLS (with R) that is freely downloadable at http://is.gd/BxqIEL Ivailo -- Ivailo: very cool. Thank you. Jay ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] quantifying directed dependence of environmental factors
That sounds like a job for path analysis or for structural equation modeling, depending on the level of sophistication desired and the hypotheses to be tested. There are plenty of good resources for both, in and out of R. Sarah On Wednesday, March 6, 2013, Jay Kerns wrote: Hello, I'm posting to this list because I believe it's the best place to go. My question is R related only inasmuch as all the work I've done so far has been with R and I expect any answers I get from here will lead me to more R work. I'm consulting with an ecologist and an engineer on a project related to a reservoir nearby. They've collected data on diatoms in the reservoir via core samples; they have sections of data over the past 100yrs. They are looking at the community structure plus other environmental factors over the same time period. We've done a ton of work already and there's no point trying to hash all of that out here. Short story: we did an NMDS, it fits OK (stress 0.17), there are obvious clusters in the ordination which correspond to a-priori clusters from ecological considerations (and which match an independent cluster analysis), we're really quite pleased overall. We checked for relationships with =envfit=, most environmental variables are *highly* significant, yet there are a couple which aren't significant at all. Here comes my question: The ecologist pointed out to me that our environmental variables don't have equal status (ecologically speaking); some variables lead to others. For instance, there are so-called ultimate factors (population, percentage farmland) which contribute to intermediate factors (suspended solids, total phosphorous) which in turn contribute to direct factors (AREA, pH,...) which then in turn contribute to diatom structure. We have measured data on all the above and several more. The model we are fitting with =envfit= is symmetric in those n environmental variables, but the ecology of the situation isn't symmetric, it's a directed top-down kind of relationship. He asked me, How can we quantify that? How can we demonstrate that? Can we quantify/demonstrate that? I don't know. There are ecologists on this list: what am I looking for, here? What methods do ecologists use to answer this (or related) question(s)? Feel free to direct me to papers, literature, textbooks, whatever. I'm trying to help answer this question and (this not being my subject specialty) I'm at a bit of a loss. If there are relevant R packages/vignettes/manuals you can point me to, that'd be cool too. Thanks for reading all the way down to here. Jay P.S. If it hadn't been for the archives of this list containing lengthy and poignant answers to *several* questions I've had already then I couldn't even have made it this far. Thank you! -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] quantifying directed dependence of environmental factors
Jay-- I'm not sure how one would combine SEM / graphical models with compositional dissimilarity as a response. You might be able to fit a series of models in adonis() or capscale(), comparing just direct factors to direct + intermediate, etc.. I don't have any good ideas on how you might test more complex causal structures. Given that you are dealing with diatoms across space (with environmental measurements) and down time (in cores, often without environmental measures), there may be an alternate approach possible based on calibration approaches to inferred environments (e.g., WACAL) or modern analogs. I would look at packages bio.infer, paltran, fossil, and analogue, and search to see if anyone has pushed them in the direction you want to go. Tom On Thu, Mar 7, 2013 at 6:50 AM, Jay Kerns gjkerns...@gmail.com wrote: Dear Sarah, On Thu, Mar 7, 2013 at 9:32 AM, Sarah Goslee sarah.gos...@gmail.com wrote: That sounds like a job for path analysis or for structural equation modeling, depending on the level of sophistication desired and the hypotheses to be tested. *Yes!* I said almost the exact same thing (I didn't say anything about Path Analysis because I don't know much about it), but I had it in my mind that SEM was targeted more to sociological things and didn't know if/that it was common in ecological contexts. Anyway, it's nice to hear that word coming from somebody else. There are plenty of good resources for both, in and out of R. Indeed. I have some work to do. Thank you. -- Jay Sarah On Wednesday, March 6, 2013, Jay Kerns wrote: Hello, ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- --- Tom Philippi, Ph.D. Quantitative Ecologist Data Therapist Inventory and Monitoring Program National Park Service (619) 523-4576 tom_phili...@nps.gov http://science.nature.nps.gov/im/monitor [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] quantifying directed dependence of environmental factors
Hi, I'm not sure how one would combine SEM / graphical models with compositional dissimilarity as a response. You might be able to fit a series of models in adonis() or capscale(), comparing just direct factors to direct + intermediate, etc.. I don't have any good ideas on how you might test more complex causal structures. There's a fair bit of literature on Mantel-based path analysis, and other similar dissimilarity-based approaches. SEM can be used with composition as well, although not (I think) with the intermediate step of calculating dissimilarities. Besides journal articles employing those techniques, I like both of these: J. B. Grace, Structural Equation Modeling and Natural Systems, Cambridge University Press, Cambridge, UK, 2006. B. Shipley, Cause and Correlation in Biology: A User’s Guide to Path Analysis, Structural Equations and Causal Inference, Cambridge University Press, Cambridge, UK, 2000. Sarah Given that you are dealing with diatoms across space (with environmental measurements) and down time (in cores, often without environmental measures), there may be an alternate approach possible based on calibration approaches to inferred environments (e.g., WACAL) or modern analogs. I would look at packages bio.infer, paltran, fossil, and analogue, and search to see if anyone has pushed them in the direction you want to go. Tom On Thu, Mar 7, 2013 at 6:50 AM, Jay Kerns gjkerns...@gmail.com wrote: Dear Sarah, On Thu, Mar 7, 2013 at 9:32 AM, Sarah Goslee sarah.gos...@gmail.com wrote: That sounds like a job for path analysis or for structural equation modeling, depending on the level of sophistication desired and the hypotheses to be tested. *Yes!* I said almost the exact same thing (I didn't say anything about Path Analysis because I don't know much about it), but I had it in my mind that SEM was targeted more to sociological things and didn't know if/that it was common in ecological contexts. Anyway, it's nice to hear that word coming from somebody else. There are plenty of good resources for both, in and out of R. Indeed. I have some work to do. Thank you. -- Jay -- Sarah Goslee http://www.functionaldiversity.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology