Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-11 Thread Gavin Simpson
On Thu, 2013-03-07 at 11:13 -0800, Rich Shepard wrote:
 On Thu, 7 Mar 2013, Philippi, Tom wrote:
 
  I would look at packages bio.infer, paltran, fossil, and analogue, and
  search to see if anyone has pushed them in the direction you want to go.

To this list I would add Steve Juggins' excellent rioja package. In
addition to several WA methods it also includes maximum likelihood
regression and calibration in the flavour of bio.infer.

bio.infer is based on the EPA's EMAP-West (Environmental Monitoring and
 Analysis Program for the western states) and uses benthic macroinvertebrates
 and fish with selected water chemistry parameters. It uses the ITIS
 (International Taxonomic Identification System) to provide consistency in
 naming taxa to the lowest reasonable level.

As far as I can tell, bio.infer contains all you say but as higher-level
utility functions.

However, IIRC at the heart of bio.infer is what we call maximum
likelihood regression and calibration; fit a Gaussian logistic
regression to each species to characterise species-env relationships,
then invert this set of models to find the value of the environmental
variable that maximises the likelihood of observing a sample of new
counts over the set of species. Invariably, the inversion involves
numerical optimisation to search for the value of the env that made the
new counts most likely.

You just need to give mlsolve() the relevant data objects, which seem to
be somewhat easy to create by hand if you don't need to look-up
harmonised or correct taxon names. You really don't need all the nice
ITIS hand-holding, though I'm sure it is very handy for those working on
relevant species groups.

G

Conceptually, one could assemble equivalent dataframes for diatom taxa and
 environmental conditions, but I don't know if ITIS has plants/algae in the
 system; problably does. However, the biota-environments relationships would
 be based on current conditions and whether this would be valid for sediment
 core data would need to be judged by a limnologist, not a stream ecologist
 like me.
 
 Rich
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-11 Thread Gavin Simpson
Hi Jay,

What you describe is similar to research conducted at UCL by myself and
colleagues and also as part of an EU project that finished a few years
ago now.

Since then, my thoughts on this have expanded a little and Sarah's point
about path analysis was where I would have gone next if I was continuing
this line of investigation.

In the work I did at UCL, I went with a time series approach. I
decomposed the species data into a set of ordination axes (I used PCA,
but on Hellinger transformed data to account for non-linear responses in
the species which PCA doesn't handle well). Then I fitted an additive
model with, say, PCA axis 1 (PC1) as the response variable and one or
more covariates entering as smooth functions as the predictors. The nice
things about the additive model is that the terms come together
additively and the non-linear effects allow the effect of a variable to
change (i.e. cold temperatures only induce an effect on the response).
As these were time series data I controlled for autocorrelation by using
a continuous time AR(1) process for the residuals.

Here are some refs on these approaches:

http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html
http://dx.doi.org/10./j.1365-2427.2012.02860.x
http://dx.doi.org/10./j.1365-2427.2011.02651.x
http://dx.doi.org/10./j.1365-2427.2011.02670.x

and there is more in the Special Issue section of FWBiol:

http://onlinelibrary.wiley.com/doi/10./fwb.2012.57.issue-10/issuetoc

Now this doesn't look at cascades of effects; one key aspect of all of
the above was the construction of appropriate time series for the
environmental data. Ideally, I'd take the variable that is most closely
related to diatom physiology as my predictor. However those variables do
not always exist or are not available. Instead surrogates can be used;
amount of agricultural land-use in catchments or fertilizer-use
historical records will be highly correlated with nutrient loading from
a catchment to the lake/reservoir, so these could be used instead. Of
course, you do need to wary of spurious correlations with time series
data.

As these models are using smooth functions, you are only going to be
able to include one or a few covariates unless you have *lots* of
samples; and anyway, I would always advise to think first from the
biological or ecological viewpoint and formulate an hypothesis there and
then fit that with the stats rather than throwing lots of variables into
an analysis to see what pops out (which seems to be what a lot of
palaeoecologists do!)

As regards envfit(), it isn't symmetric in the variables at least as far
as I see it; it fits a model of

varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon

in other words it uses the 2d axis scores (PC1 and PC2, or nMDS1 +
nMDS2 coordinates) to predict the values of the response using a linear
model. As each environmental variable is modelled separately
(individually), one is not favouring a set of variables etc. Perhaps
this is not what you meant but worth pointing out.

Also, envfit presumes a linear relationship between the variables and
the ordination coordinates. If that is too strong an assumption, see
ordisurf() which fits GAM-based surfaces rather than linear-regression
surfaces.

A fairly standard way of looking at this sort of data might be to group
variables that are related and then decompose the variance in the data
in that which can be explained by each group of variables uniquely, that
which can be explained by two or more groups, and the unexplained
variance. The vegan package has a function varpart() which can do this
all for you if you are willing to use RDA to analyse the data (unbiased
estimates of the variance explained are not available for CCA and nMDS
is not a constrained technique) - note you can use principal coordinates
analysis to embed your original dissimilarity matrix into a metric space
and then take the PCoA axis scores as the input data for the RDA so
that the RDA is in the dissimilarity data of your choice and not linear
in the original data.

Steve Juggins has adapted the hierarchical partitioning approach from
package hier.part to the multivariate multiple regression setting of RDA
(possibly CCA too?) which is related to but somewhat different to the
variance partitioning described above. I don't believe Steve has
released this code yet, so if interested I'd emailing him for it; he is
the author of the rioja package so contact details can be found on CRAN.

Neither variance partitioning or hierarchical partitioning directly do
exactly what you ask and model the directed dependence or pathways of
effects. They are however far simpler methods which would have
familiarity within the applied community that will see these
results/papers etc.

In writing this I have pondered on whether you and/or the ecologists are
making it too complex? As you have all the variables of interest, I
might model the variables that physiologically affect the diatoms and
their effect on 

Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-11 Thread Jay Kerns
Dear Gavin,

On Mon, Mar 11, 2013 at 1:58 PM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:
 Hi Jay,

 What you describe is similar to research conducted at UCL by myself and
 colleagues and also as part of an EU project that finished a few years
 ago now.

 Since then, my thoughts on this have expanded a little and Sarah's point
 about path analysis was where I would have gone next if I was continuing
 this line of investigation.

 In the work I did at UCL, I went with a time series approach. I
 decomposed the species data into a set of ordination axes (I used PCA,
 but on Hellinger transformed data to account for non-linear responses in
 the species which PCA doesn't handle well). Then I fitted an additive
 model with, say, PCA axis 1 (PC1) as the response variable and one or
 more covariates entering as smooth functions as the predictors. The nice
 things about the additive model is that the terms come together
 additively and the non-linear effects allow the effect of a variable to
 change (i.e. cold temperatures only induce an effect on the response).
 As these were time series data I controlled for autocorrelation by using
 a continuous time AR(1) process for the residuals.

 Here are some refs on these approaches:

 http://www.aslo.org/lo/toc/vol_54/issue_6_part_2/2529.html
 http://dx.doi.org/10./j.1365-2427.2012.02860.x
 http://dx.doi.org/10./j.1365-2427.2011.02651.x
 http://dx.doi.org/10./j.1365-2427.2011.02670.x

 and there is more in the Special Issue section of FWBiol:

 http://onlinelibrary.wiley.com/doi/10./fwb.2012.57.issue-10/issuetoc

 Now this doesn't look at cascades of effects; one key aspect of all of
 the above was the construction of appropriate time series for the
 environmental data. Ideally, I'd take the variable that is most closely
 related to diatom physiology as my predictor. However those variables do
 not always exist or are not available. Instead surrogates can be used;
 amount of agricultural land-use in catchments or fertilizer-use
 historical records will be highly correlated with nutrient loading from
 a catchment to the lake/reservoir, so these could be used instead. Of
 course, you do need to wary of spurious correlations with time series
 data.

 As these models are using smooth functions, you are only going to be
 able to include one or a few covariates unless you have *lots* of
 samples; and anyway, I would always advise to think first from the
 biological or ecological viewpoint and formulate an hypothesis there and
 then fit that with the stats rather than throwing lots of variables into
 an analysis to see what pops out (which seems to be what a lot of
 palaeoecologists do!)

 As regards envfit(), it isn't symmetric in the variables at least as far
 as I see it; it fits a model of

 varZ = \beta_1 axis_1 + \beta_2 axis_2 + \varepsilon

 in other words it uses the 2d axis scores (PC1 and PC2, or nMDS1 +
 nMDS2 coordinates) to predict the values of the response using a linear
 model. As each environmental variable is modelled separately
 (individually), one is not favouring a set of variables etc. Perhaps
 this is not what you meant but worth pointing out.


Yes, you are right, that is not what I meant, and you've said it
better than I did (or knew how to).


 Also, envfit presumes a linear relationship between the variables and
 the ordination coordinates. If that is too strong an assumption, see
 ordisurf() which fits GAM-based surfaces rather than linear-regression
 surfaces.


Yes, there is evidence of nonlinearity in our data and we've done work
with ordisurf, too.


 A fairly standard way of looking at this sort of data might be to group
 variables that are related and then decompose the variance in the data
 in that which can be explained by each group of variables uniquely, that
 which can be explained by two or more groups, and the unexplained
 variance. The vegan package has a function varpart() which can do this
 all for you if you are willing to use RDA to analyse the data (unbiased
 estimates of the variance explained are not available for CCA and nMDS
 is not a constrained technique) - note you can use principal coordinates
 analysis to embed your original dissimilarity matrix into a metric space
 and then take the PCoA axis scores as the input data for the RDA so
 that the RDA is in the dissimilarity data of your choice and not linear
 in the original data.

 Steve Juggins has adapted the hierarchical partitioning approach from
 package hier.part to the multivariate multiple regression setting of RDA
 (possibly CCA too?) which is related to but somewhat different to the
 variance partitioning described above. I don't believe Steve has
 released this code yet, so if interested I'd emailing him for it; he is
 the author of the rioja package so contact details can be found on CRAN.

 Neither variance partitioning or hierarchical partitioning directly do
 exactly what you ask and model the directed dependence or pathways of
 effects. 

Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-08 Thread Ivailo
On Thu, Mar 7, 2013 at 9:20 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
...
 There's a fair bit of literature on Mantel-based path analysis, and
 other similar dissimilarity-based approaches. SEM can be used with
 composition as well, although not (I think) with the intermediate step
 of calculating dissimilarities.

 Besides journal articles employing those techniques, I like both of these:

 J. B. Grace, Structural Equation Modeling and Natural Systems,
 Cambridge University Press, Cambridge, UK, 2006.

 B. Shipley, Cause and Correlation in Biology: A User’s Guide to Path
 Analysis, Structural Equations and Causal Inference, Cambridge
 University Press, Cambridge, UK, 2000.

I recently stumbled on a great book on path modelling using PLS (with
R) that is freely downloadable at http://is.gd/BxqIEL

Ivailo
--
UBUNTU: a person is a person through other persons.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-08 Thread Jay Kerns
On Thu, Mar 7, 2013 at 2:13 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
 On Thu, 7 Mar 2013, Philippi, Tom wrote:

 I would look at packages bio.infer, paltran, fossil, and analogue, and
 search to see if anyone has pushed them in the direction you want to go.


   bio.infer is based on the EPA's EMAP-West (Environmental Monitoring and
 Analysis Program for the western states) and uses benthic macroinvertebrates
 and fish with selected water chemistry parameters. It uses the ITIS
 (International Taxonomic Identification System) to provide consistency in
 naming taxa to the lowest reasonable level.

   Conceptually, one could assemble equivalent dataframes for diatom taxa and
 environmental conditions, but I don't know if ITIS has plants/algae in the
 system; problably does. However, the biota-environments relationships would
 be based on current conditions and whether this would be valid for sediment
 core data would need to be judged by a limnologist, not a stream ecologist
 like me.

 Rich


Rich: thank you.  I am not sure about the ITIS, but maybe my colleague
does.  I will ask him.

Thanks again.
Jay

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-08 Thread Jay Kerns
 I recently stumbled on a great book on path modelling using PLS (with
 R) that is freely downloadable at http://is.gd/BxqIEL

 Ivailo
 --

Ivailo: very cool.  Thank you.

Jay

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-07 Thread Sarah Goslee
That sounds like a job for path analysis or for structural equation
modeling, depending on the level of sophistication desired and the
hypotheses to be tested.

There are plenty of good resources for both, in and out of R.

Sarah

On Wednesday, March 6, 2013, Jay Kerns wrote:

 Hello,

 I'm posting to this list because I believe it's the best place to
 go.  My question is R related only inasmuch as all the work I've
 done so far has been with R and I expect any answers I get from
 here will lead me to more R work.

 I'm consulting with an ecologist and an engineer on a project
 related to a reservoir nearby.  They've collected data on diatoms
 in the reservoir via core samples; they have sections of data over
 the past 100yrs.  They are looking at the community structure
 plus other environmental factors over the same time period.

 We've done a ton of work already and there's no point trying to
 hash all of that out here.  Short story: we did an NMDS, it fits
 OK (stress 0.17), there are obvious clusters in the ordination
 which correspond to a-priori clusters from ecological
 considerations (and which match an independent cluster analysis),
 we're really quite pleased overall.  We checked for relationships
 with =envfit=, most environmental variables are *highly*
 significant, yet there are a couple which aren't significant at
 all.  Here comes my question:

 The ecologist pointed out to me that our environmental variables
 don't have equal status (ecologically speaking); some variables
 lead to others.  For instance, there are so-called ultimate
 factors (population, percentage farmland) which contribute to
 intermediate factors (suspended solids, total phosphorous) which
 in turn contribute to direct factors (AREA, pH,...) which then in
 turn contribute to diatom structure.

 We have measured data on all the above and several more.  The
 model we are fitting with =envfit= is symmetric in those n
 environmental variables, but the ecology of the situation isn't
 symmetric, it's a directed top-down kind of relationship.  He
 asked me, How can we quantify that?  How can we demonstrate
 that?  Can we quantify/demonstrate that?  I don't know.

 There are ecologists on this list: what am I looking for, here?
 What methods do ecologists use to answer this (or related)
 question(s)?  Feel free to direct me to papers, literature,
 textbooks, whatever.  I'm trying to help answer this question
 and (this not being my subject specialty) I'm at a bit of a loss.

 If there are relevant R packages/vignettes/manuals you can point
 me to, that'd be cool too.

 Thanks for reading all the way down to here.

 Jay

 P.S. If it hadn't been for the archives of this list containing
 lengthy and poignant answers to *several* questions I've had
 already then I couldn't even have made it this far.  Thank you!





-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-07 Thread Philippi, Tom
Jay--

I'm not sure how one would combine SEM / graphical models with
compositional dissimilarity as a response.  You might be able to fit a
series of models in adonis() or capscale(), comparing just direct factors
to direct + intermediate, etc..  I don't have any good ideas on how you
might test more complex causal structures.

Given that you are dealing with diatoms across space (with environmental
measurements) and down time (in cores, often without environmental
measures), there may be an alternate approach possible based on calibration
approaches to inferred environments (e.g., WACAL) or modern analogs.  I
would look at packages bio.infer, paltran, fossil, and analogue, and search
to see if anyone has pushed them in the direction you want to go.

Tom



On Thu, Mar 7, 2013 at 6:50 AM, Jay Kerns gjkerns...@gmail.com wrote:

 Dear Sarah,

 On Thu, Mar 7, 2013 at 9:32 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
  That sounds like a job for path analysis or for structural equation
  modeling, depending on the level of sophistication desired and the
  hypotheses to be tested.
 

 *Yes!*  I said almost the exact same thing (I didn't say anything
 about Path Analysis because I don't know much about it), but I had it
 in my mind that SEM was targeted more to sociological things and
 didn't know if/that it was common in ecological contexts.  Anyway,
 it's nice to hear that word coming from somebody else.

  There are plenty of good resources for both, in and out of R.

 Indeed.  I have some work to do.  Thank you.

 --
 Jay



  Sarah
 
 
  On Wednesday, March 6, 2013, Jay Kerns wrote:
 
  Hello,
 

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
---
Tom Philippi, Ph.D.
Quantitative Ecologist  Data Therapist
Inventory and Monitoring Program
National Park Service
(619) 523-4576
tom_phili...@nps.gov
http://science.nature.nps.gov/im/monitor

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-07 Thread Sarah Goslee
Hi,

 I'm not sure how one would combine SEM / graphical models with compositional
 dissimilarity as a response.  You might be able to fit a series of models in
 adonis() or capscale(), comparing just direct factors to direct +
 intermediate, etc..  I don't have any good ideas on how you might test more
 complex causal structures.

There's a fair bit of literature on Mantel-based path analysis, and
other similar dissimilarity-based approaches. SEM can be used with
composition as well, although not (I think) with the intermediate step
of calculating dissimilarities.

Besides journal articles employing those techniques, I like both of these:

J. B. Grace, Structural Equation Modeling and Natural Systems,
Cambridge University Press, Cambridge, UK, 2006.

B. Shipley, Cause and Correlation in Biology: A User’s Guide to Path
Analysis, Structural Equations and Causal Inference, Cambridge
University Press, Cambridge, UK, 2000.

Sarah

 Given that you are dealing with diatoms across space (with environmental
 measurements) and down time (in cores, often without environmental
 measures), there may be an alternate approach possible based on calibration
 approaches to inferred environments (e.g., WACAL) or modern analogs.  I
 would look at packages bio.infer, paltran, fossil, and analogue, and search
 to see if anyone has pushed them in the direction you want to go.

 Tom



 On Thu, Mar 7, 2013 at 6:50 AM, Jay Kerns gjkerns...@gmail.com wrote:

 Dear Sarah,

 On Thu, Mar 7, 2013 at 9:32 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
  That sounds like a job for path analysis or for structural equation
  modeling, depending on the level of sophistication desired and the
  hypotheses to be tested.
 

 *Yes!*  I said almost the exact same thing (I didn't say anything
 about Path Analysis because I don't know much about it), but I had it
 in my mind that SEM was targeted more to sociological things and
 didn't know if/that it was common in ecological contexts.  Anyway,
 it's nice to hear that word coming from somebody else.

  There are plenty of good resources for both, in and out of R.

 Indeed.  I have some work to do.  Thank you.

 --
 Jay



--
Sarah Goslee
http://www.functionaldiversity.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology