Re: [R-sig-eco] How to get the total variance explained from an envfit object?

Jari Oksanen Fri, 08 Nov 2019 06:33:57 -0800

Dear Claire Elizabeth Couch,

First let me explain how envfit() works: For continuous environmental variables 
it actually uses ordination to predict the environmental variable. Under the 
hood (bonnet), it fits a first degree trend surface (plane in 2D) for 
environmental variable over the ordination scores, and the R2 is the proportion 
that surface explains of the variable. The arrow shown is the gradient of this 
fitted trend surface and shows the direction to which the variable changes most 
rapidly in a first degree linear model.


Clearly you cannot add these R2 values, because your environmental variables 
can be (and normally are) inter-correlated.

It seems that you want to work into another direction than envfit: Predict 
ordination scores by a set of environmental variables.

There are many ways of doing this in R, although we do not provide canned tool 
for this. You can actually do this even with multiple linear model with 
function lm() of R::stats. Here is how to do this with vegan::rda() function:

library(vegan)
data(varespec, varechem)
ord <- metaMDS(varespec, trace = FALSE)
fit <- rda(scores(ord) ~ ., data = varechem)

The basic output of this will show you R2:

Call: rda(formula = scores(ord) ~ N + P + K + Ca + Mg + S + Al + Fe +
Mn + Zn + Mo + Baresoil + Humdepth + pH, data = varechem)

              Inertia Proportion Rank
Total         0.13905    1.00000
Constrained   0.11357    0.81681    2
Unconstrained 0.02547    0.18319    2
Inertia is variance

Here the “Proportion” for the Constrained component is the overall R2 = 
0.81681. If you want to see the adjusted R2, this is found with

 RsquareAdj(fit)
$r.squared
[1] 0.8168084

$adj.r.squared
[1] 0.5318436

However, you only get the overall R2, but not partial R2 values for single 
variables. You can use anova(fit, by = “margin”) to find the (lacking) marginal 
significances of unique effects of the variables, though.

The regression coefficients can be found with command coef() — and probably you 
want them normalized:

> coef(fit, norm=TRUE)
                 RDA1        RDA2
N        -0.154511701  0.48579513
P         0.002463991 -0.13179802
...
pH        0.701009027 -0.66724274

You can also simplify this model in the usual way, for instance with the 
function ordistep() that uses permutation test to drop variables one by one (or 
you can build up this model adding variables one by one if you start with an 
empty model with ordistep() or ordiR2step()):

ordistep(fit)            # drop variables
m0 <- update(fit, . ~ 1) # m0 is an empty model
ordistep(m0, scope=fit)  # add variables to an empty model

(After these anova(…, by = “margin”) results also give significant effects.)

All this sounds a bit weird to me (or more than “a bit”), but it can be done. I 
guess there are some readers who get hiccups for using RDA on NMDS, but this 
can be done as the NMDS space is metric (the *transfer* function is non-metric 
from community dissimilarities to metric ordination space). It also sounds 
really odd to have an ordination scores as dependent data in RDA, but this 
exactly answers the problem you presented: predict ordination scores by a set 
of external variables.  After all, RDA is nothing but a linear regression for 
multivariate response data, and there is no need to think it as an ordination.

Cheers, Jari Oksanen

On 7 Nov 2019, at 22:02, Couch, Claire Elizabeth 
<cou...@oregonstate.edu<mailto:cou...@oregonstate.edu>> wrote:

I am analyzing some microbiome data by using unconstrained ordination (PCA
or NMDS) followed by environmental vector fitting with the envfit function
in the vegan package. The output of envfit includes an r2 value for each
vector or factor included in the envfit model, but I am interested in the
total amount of variation explained by all the vectors/factors, rather than
just stand-alone variables. I presume I cannot simply add up the R2 values
assigned to each environmental variable, because there may be overlap in
the microbiome variation that is "explained" by each environmental
variable. However, there does not seem to be any way of accessing the total
r2 value for the model.

Using an example dataset, this is what I have tried so far:

library(vegan)
library(MASS)

data(varespec, varechem)
library(MASS)
ord <- metaMDS(varespec)
fit <- envfit(ord, varechem, perm = 999)
fit

This shows r2 for each environmental variable, but how do I extract the r2
value for the entire model?

I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but
these all return NULL.

I would greatly appreciate any suggestions!

[[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] How to get the total variance explained from an envfit object?

Reply via email to