Dear forum,

My first time posting to this forum -- afraid I am going to start with a somewhat conceptual/complicated question.

My question concerns the analysis of environmental factors as predictors of community (using vegan). Specifically I am trying to understand why envit and ordisurf appear to be giving me a qualitatively different picture of environmental correlations with my community matrix when compared with the bioenv function/approach.

Briefly:

I am using NMDS to reduce dimensionality of a large dataset that of soil microbes across broad geographic scale (there are over 250 sites and 60 species in the matrix). I got the metaMDS in vegan to converge on a solution despite the high number of site pairs that shared no species (~30%) by using the noshare=0.1 modifier.

Call:
metaMDS(comm = decostand(comm.matrix, "pa"), distance = "jaccard", k = 3, trymax = 150, engine = "monoMDS", noshare = 0.1, weakties = TRUE, stress = 1, maxit = 500, scaling = TRUE, pc = TRUE, smin = 1e-04, sfgrmin = 1e-07, sratmax = 0.99999, zerodist = "add")

global Multidimensional Scaling using monoMDS

Data: pa(comm.matrix)
Distance: jaccard shortest

Dimensions: 3
Stress: 0.1598281
Stress type 1, weak ties
Two convergent solutions found after 34 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘pa(comm.matrix)’

Next I ran I envfit on a corresponding environmental matrix which contained both continuous and categorical data.

ef1 <- envfit(nmds_output, envir.matrix, permu = 999, na.rm=TRUE) #(very long) output not shown

This as I understand it assumes a linear relationship between the continuous variables and the metaMDS axes. This didn't seem quite right so I used gam via ordisurf to get a better estimate and visualization of the relationships for all of the variables individually. There was reasonable correspondence between estimates of the strengths of the relationships using envfit and ordisurf.

ordi<-list()
for (i in 1:ncol(envir.matrix)){ ordi[[i]]<-ordisurf(nmds_final, envir.matrix[,i], add=FALSE) } #output not shown


Of course many of the environmental variables considered are highly correlated and/or may have additive effects, so ideally I wanted to do some sort of model comparison that would choose the most parsimonious model that best describes the community using the optimal number of predictors. A good option seemed to be bioenv. Having read the vegan help and various tutorial as well as Clarke, K. R & Ainsworth, M. 1993 I understand that bioenv uses a totally different approach, which as I understand is based on the rank correlations between distance matrices calculated from the (biotic) community data v. subsets of potential predictors from an environmental matrix. (Hope I have that right).

#bioenv.output<-bioenv(comm=vegdist(decostand(comm.matrix, "pa"), "jaccard"),env=envir.matrix, metric="gower", upto=7) #yes, this took a long time

> bioenv.output

Call:
bioenv(comm = vegdist(decostand(comm.matrix, "pa"), "jaccard"), env = envir.matrix, upto = 7, metric = "gower")

Subset of environmental variables with best correlation to community data.

Correlations: spearman
Dissimilarities: jaccard
Metric: gower

Best model has 4 parameters (max. 7 allowed):
bio_1 bio_10 bio_19 Latitude
with correlation 0.3217859

Where I'm coming unstuck is that envfit and ordisurf give me an entirely different picture of which variables are important when compared with the bioenv function. For example, a variable called "Bioregion" is highly significant with respect the to the prediction of axes 1 and 2 of the NMDS using envfit, and based on an R-squared of 0.32 is the strongest predictor of the lot. Alternatively, "Bioregion" never makes the top models at all when I use the bioenv function. One could argue that maybe "Bioregion" is correlated with other variables like latitude and that these other variables are simply better, but actually Bioregion alone is a terrible predictor based on bioenv. I know this since I can force bioenv to choose Bioregion (by giving it only a subset of the environmental matrix that I know to contain really poor predictors), and when do this I get an r value of 0.05. This is only an example -- several of the top performers according to envfit do not feature at all using the bioenv model selection approach.

So my question really is 'how do I interpret/deal with these discrepancies?' I know that the r values from the two approaches mean totally different things and can't be directly compared, but one approach leads me to think that a certain subset of the environmental variables might be important, and the other approach appears to be telling me something different. I must admit this is making my brain hurt as I would expect at least moderate correspondence or agreement among methods. I like the bioenv function as it allows competing models and the evaluation of combinations of variables, though I must admit that envfit/ordisurf approach is a bit more intuitive to me.

Has anyone else worked through similar issues? Am I missing something obvious, and/or are there suggestions for a way forward?

Thanks in advance for any help!

Jeff

_______________________________________________
R-sig-ecology mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to