Tim pointed put that he has only 132 samples out of 2800 with a species present and I am curious what people think about how well we can model that with logistic regression. -Chris
On Aug 25, 2011, at 10:36 AM, Pedro Lima Pequeno <pacol...@gmail.com> wrote: > Hi Tim, > > there are several ways of dealing with spatial autocorrelation in > ecological models (see e.g. Dormann 2007: Methods to account for > spatial autocorrelation in the analysis of species distributional > data: a review; and Beale et al. 2010: Regression analysis of spatial > data). As always, this is an area of active research, so the right or > wrong thing to do is not as clear as it may seem. Some have even > concluded that "changes in coefficients between spatial and > non-spatial methods depend on the method used and are largely > idiosyncratic", so that "researchers may have little choice but to be > more explicit about the uncertainty of models and more cautious in > their interpretation" (Bini et al. 2009: Coefficient shifts in > geographical ecology: an empirical evaluation of spatial and > non-spatial regression). Thus, new methods are emerging at a faster > rate than people studying and comparing their properties. Nonetheless, > I think some observations are useful: > 1) there are methods explicitly designed to detect spatial > autocorrelation such as Moran's autocorrelograms or variograms > (available in several R packages). As already pointed out, the > autocorrelation function is "well behaved" with linear, equally spaced > series in time or space; > 2) minimal adequate model selection with the AIC is sensitive to > residual autocorrelation; it tends to generate unstable and overfitted > models. Thus, when applying any model selection procedure, you should > account for the uncertainty in the process by averaging the model set > (or model predictions) with respect to the relative support of each > model (e.g. Akaike weights). Since you have a large sample, you could > account for residual spatial autocorrelation using eigenvector > filtering, which produces synthetic variables that capture spatial > patterns and can be included in linear models as explanatory variables > - a more intuitive approach if you don't want to mess with random > factors (see e.g. Diniz-Filho et al. 2008: Model selection and > information theory in geographical ecology). This can be implemented > with packages spacemakeR and vegan; > 3) autocorrelation in model residuals is not the only - nor most > important - problem in biological modeling; model misspecification is > the major issue. Residual autocorrelation often arises due to not > including relevant explanatory variables, interaction terms, assuming > an inappropriate response shape and/or an inadequate variance > structure, or any combination of these. All these things need to be > checked for proper model validation, for instance by partial > regression plots (or added-varialbe plots), which help you see the > shape of the response to each explanatory variable after account for > the variation in the remaining explanatory set. > At the same time, you could plot the residuals againts both model > predictions and explanatory variables. Since residuals are the > stochastic component of the model ("noise"), its relation with the > systematic components should be random; clear patterns in these plots > are indications of misspecification. > Finally, all this "model tinkering" is based on two fundamental > premises: you want to model a mean tendecy of response and the pattern > of variation around it. These are strictly statistical properties of > data - they have nothing to do with biology. If you don't really > believe the biological process you are studying implies a mean > response, but rather e.g. a maximum one (such as in population or > abundance limitation), than all these methods will actually induce you > to misspecify the model, but there are alternatives (see e.g. Cade et > al. 2005 - Quantile regression reveals hidden bias and uncertainty in > habitat models). > > 2011/8/25, Tim Seipel <t.sei...@env.ethz.ch>: >> >> Dear List, >> I am trying to determine the best environmental predictors of the >> presence of a species along an elevational gradient. >> Elevation ranges from 400 to 2050 m a.s.l. and the ratio of presences to >> absences is low (132 presences out 2800 samples) >> >> So to start I fit the full model of with the variable of interest. >> >> sc.m<-glm(PA~sp.max+su.mmin+su.max+fa.mmin+fa.max+Slope+Haupt4+Pop_density+Dist_G+Growi_sea+,data=sc.pa,'binomial') >> >> First, I performed univariate and backward selection using Akaike >> Information Criteria, and the fit was good and realistic given my >> knowledge of the environment though the D^2 was low 0.08. My final model >> was: >> --------------------------------- >> glm(formula = PA ~ Slope + sp.mmin + su.max + fa.mmin + Haupt4, >> family = "binomial", data = sc.pa) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -0.5415 -0.3506 -0.2608 -0.1762 3.0768 >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -73.45212 23.13842 -3.174 0.00150 ** >> Slope -0.03834 0.01174 -3.265 0.00109 ** >> sp.mmin -15.34594 5.30360 -2.893 0.00381 ** >> su.max 5.09712 1.70332 2.992 0.00277 ** >> fa.mmin 13.52262 4.64021 2.914 0.00357 ** >> Haupt42 -0.72237 0.27710 -2.607 0.00914 ** >> Haupt43 -0.95730 0.37762 -2.535 0.01124 * >> Haupt44 -0.25357 0.24330 -1.042 0.29731 >> --- >> Null deviance: 958.21 on 2784 degrees of freedom >> Residual deviance: 896.10 on 2777 degrees of freedom >> AIC: 912.1 >> >> ---------------------- >> >> I then realized that my residuals were all highly correlated (0.8-0.6) >> when I plotted them using acf() function. >> >> So to account for this I used glmmPQL to fit the full model: >> >> model.sc.c <- glmmPQL(PA ~ >> sp.mmin+su.mmin+su.max+fa.mmin+Slope+Haupt4+Pop_density+Dist_G+Growi_sea, >> random= >> ~1|group.sc, data=sc.dat, family=binomial, correlation=corAR1()) >> >> However, the algorithm failed to converge and all the p-vaules were >> either 0 or 1 and coefficient estimates approached infinity. >> Additionally the grouping factor of the random effect is slightly >> arbitrary and accounts a tiny amount of variation. >> >> --- >> So know I feel stuck between a rock and a hard place, on the one hand I >> know I have a lot of autocorrelation and on the other hand I don't have >> a clear way to include it in the model. >> >> I would appreciate any advice on the matter. >> >> Sincerely, >> >> Tim >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-ecology mailing list >> R-sig-ecology@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology >> > > > -- > Pedro A. C. Lima Pequeno > Programa de Pós-graduação em Ecologia > Instituto Nacional de Pesquisas da Amazônia > Manaus, AM, Brasil > > _______________________________________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology