On Thu, 2012-05-24 at 15:00 -0700, J Straka wrote: > Hello, > > I'm planning on using a regression model to describe seed set of plants (my > response) using some sort of predictor based on temperature. I have a > number of temperature variables calculated from the same set of data > (hourly temperatures for the growing season, converted to variables such as > average temperature, maximum temperature, minimum temperature, degree-days > above zero Celsius, degree days above ten Celsius, etc...), and I want to > decide which one should be included in my model. I know that I would > ideally select one based on "prior knowledge" of the system (e.g. so-called > "planned comparisons" or choosing a temperature threshold that is known to > be important for the development of seeds), but not much is known about > this system.
What is the model for? Understanding so you want to interpret the coefficients directly as something meaningful or for prediction? If the latter I would say it doesn't really matter; choose the model that gives the best out-of-sample predictions (lowest error etc), or average predictions over a set of best/good models. Simply choosing the best model via some sort of selection procedure may result in a model with high variance (change the data a bit and different variables would be selected). If so, consider a regression method that applies shrinkage to the coefficients such as the lasso or the elastic net; this will lead to a small bit of bias in the estimates of the coefficients but should reduce the variance of the final model because you are considering the selection of variables as part of the model itself. If you want to interpret the model coefficients as something real then you have to be very careful doing any form of selection; the stepwise procedures and best subsets all can potentially lead to strong bias in the model coefficients. Be removing a variable from the model in effect you are saying that the sample estimate of the effect of that variable on the response is 0, not some small (statistically insignificant) value. This is a very tricky thing to get right and I'm not sure I know the right answer (or even if there is one!?). > I've been warned against testing the significance of multiple predictors > using p-values, unless I use Bonferroni correction (or some equivalent). > Unfortunately, using Bonferroni correction would result in something like p > = 0.05/7 (for seven different temperature variables); a rather small value > for detecting anything! I was wondering whether it would be appropriate to > instead use likelihood-based techniques (direct comparisons of > log-likelihoods or AIC scores) to compare a series of models using each of > the alternative predictors in turn, and choose the most relevant > temperature variable (i.e. predictor) based on that. Choosing models by AIC or BIC is just the same as doing it using p-values; the selection procedure has all the problems I mention above. LRTs require a significance test of the ratio of the two likelihoods, so you are still doing a series of sequential tests that you might want to control the overal error rate of. There are other corrections for multiple testing. For example, see the p.adjust() function in R for some options. HTH G > Thoughts on the validity of this approach? Would any adjustments have to be > made for multiple comparisons if I used this strategy? > > Jason Straka > University of Victoria > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology