[R-sig-eco] Comparison and interpretation of results from envfit, ordisurf and bioenv in the vegan package

Jeff Mon, 05 Jan 2015 23:27:07 -0800

Dear forum,

My first time posting to this forum -- afraid I am going to start with asomewhat conceptual/complicated question.

My question concerns the analysis of environmental factors as predictorsof community (using vegan). Specifically I am trying to understand whyenvit and ordisurf appear to be giving me a qualitatively differentpicture of environmental correlations with my community matrix whencompared with the bioenv function/approach.


Briefly:

I am using NMDS to reduce dimensionality of a large dataset that of soilmicrobes across broad geographic scale (there are over 250 sites and 60species in the matrix). I got the metaMDS in vegan to converge on asolution despite the high number of site pairs that shared no species(~30%) by using the noshare=0.1 modifier.


Call:

metaMDS(comm = decostand(comm.matrix, "pa"), distance = "jaccard", k =3, trymax = 150, engine = "monoMDS", noshare = 0.1, weakties = TRUE,stress = 1, maxit = 500, scaling = TRUE, pc = TRUE, smin = 1e-04,sfgrmin = 1e-07, sratmax = 0.99999, zerodist = "add")


global Multidimensional Scaling using monoMDS

Data: pa(comm.matrix)
Distance: jaccard shortest

Dimensions: 3
Stress: 0.1598281
Stress type 1, weak ties
Two convergent solutions found after 34 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘pa(comm.matrix)’

Next I ran I envfit on a corresponding environmental matrix whichcontained both continuous and categorical data.

ef1 <- envfit(nmds_output, envir.matrix, permu = 999, na.rm=TRUE) #(verylong) output not shown

This as I understand it assumes a linear relationship between thecontinuous variables and the metaMDS axes. This didn't seem quite rightso I used gam via ordisurf to get a better estimate and visualization ofthe relationships for all of the variables individually. There wasreasonable correspondence between estimates of the strengths of therelationships using envfit and ordisurf.


ordi<-list()

for (i in 1:ncol(envir.matrix)){ ordi[[i]]<-ordisurf(nmds_final,envir.matrix[,i], add=FALSE) } #output not shown

Of course many of the environmental variables considered are highlycorrelated and/or may have additive effects, so ideally I wanted to dosome sort of model comparison that would choose the most parsimoniousmodel that best describes the community using the optimal number ofpredictors. A good option seemed to be bioenv. Having read the veganhelp and various tutorial as well as Clarke, K. R & Ainsworth, M. 1993 Iunderstand that bioenv uses a totally different approach, which as Iunderstand is based on the rank correlations between distance matricescalculated from the (biotic) community data v. subsets of potentialpredictors from an environmental matrix. (Hope I have that right).

#bioenv.output<-bioenv(comm=vegdist(decostand(comm.matrix, "pa"),"jaccard"),env=envir.matrix, metric="gower", upto=7) #yes, this took along time


> bioenv.output

Call:

bioenv(comm = vegdist(decostand(comm.matrix, "pa"), "jaccard"), env =envir.matrix, upto = 7, metric = "gower")


Subset of environmental variables with best correlation to community data.

Correlations: spearman
Dissimilarities: jaccard
Metric: gower

Best model has 4 parameters (max. 7 allowed):
bio_1 bio_10 bio_19 Latitude
with correlation 0.3217859

Where I'm coming unstuck is that envfit and ordisurf give me an entirelydifferent picture of which variables are important when compared withthe bioenv function. For example, a variable called "Bioregion" ishighly significant with respect the to the prediction of axes 1 and 2 ofthe NMDS using envfit, and based on an R-squared of 0.32 is thestrongest predictor of the lot. Alternatively, "Bioregion" never makesthe top models at all when I use the bioenv function. One could arguethat maybe "Bioregion" is correlated with other variables like latitudeand that these other variables are simply better, but actually Bioregionalone is a terrible predictor based on bioenv. I know this since I canforce bioenv to choose Bioregion (by giving it only a subset of theenvironmental matrix that I know to contain really poor predictors), andwhen do this I get an r value of 0.05. This is only an example --several of the top performers according to envfit do not feature at allusing the bioenv model selection approach.

So my question really is 'how do I interpret/deal with thesediscrepancies?' I know that the r values from the two approaches meantotally different things and can't be directly compared, but oneapproach leads me to think that a certain subset of the environmentalvariables might be important, and the other approach appears to betelling me something different. I must admit this is making my brainhurt as I would expect at least moderate correspondence or agreementamong methods. I like the bioenv function as it allows competing modelsand the evaluation of combinations of variables, though I must admitthat envfit/ordisurf approach is a bit more intuitive to me.

Has anyone else worked through similar issues? Am I missing somethingobvious, and/or are there suggestions for a way forward?


Thanks in advance for any help!

Jeff

_______________________________________________
R-sig-ecology mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Comparison and interpretation of results from envfit, ordisurf and bioenv in the vegan package

Reply via email to