Galina:  The AIC, delta AIC, and AIC weights all reference an entire model
and provide no information on how you should interpret the individual
parameters within a model.  If you believe  based on your AIC weights that
the model with ZONE + LABRADOR TEA + YEAR + ZONE x LABRADOR TEA is a
reasonable candidate then your decision about interpreting the interaction
should only be based on that term relative to other terms within this
model.  And, yes, if you were going to bother including an interaction in a
model you better be willing to interpret it.  And don't waste your time AIC
model averaging the parameter estimates across these multiple models.
 There is no useful point for doing so other than deluding yourself into
thinking you somehow have addressed model uncertainty.  I'm including a
previous post of mine to r-sig-ecolog list regarding the issues with AIC
model averaging of individual regression parameter estimates, its
ridiculous use for determining relative importance of predictors, etc.

Brian

r-sig-ecology post:

Joana and any others:  You cannot obtain a valid or useful measure of
relative importance of predictor variables across multiple models by
applying relative AIC weights or using model averaged coefficients unless
all your models included a single predictor (which, of course, is not what
is usually done).   And this applies to hurdle or any other models.  AIC
(and relative weights) apply to the log likelihood maximized for estimating
a model that may be composed of 1 to many predictors.  The log likelihood
nor its associated AIC for a model has any ability to distinguish among the
contributions of the individual predictors used in maximizing the log
lilkelihood, and most useful definitions of relative importance of
predictors within a model requires some ability to make that distinction.
 The best that AIC and relative AIC weights applied to individual
 predictor coefficients can tell you is the relative importance of the
models in which the predictors occurred.  And that is not the same as
relative importance of predictors for most statisticians.   It is quite
possible to have a predictor with little relative importance within a model
that has high relative importance, and, of course the opposite is true too.
 This AIC weights approach ignores that fact.  Burnham and Anderson (2002,
2004) have done ecologists a great disservice by suggesting that AIC model
weights can be used to address relative importance of individual predictors
in models that included multiple predictors.  AIC model weights can be used
to assess the relative importance of models (that are combinations of one
to many predictors) but are insufficient to address the relative importance
of individual predictor variables because they don’t recognize the
differential contribution of individual predictors to the likelihood (or
equivalently deviance, or sums of squares).  Indeed, the use of AIC model
weights, as employed by most people, acts as if for a given model that all
predictors  contributed equally to the likelihood and, thus, get the same
weight for being in that model.  That is a totally unreasonable assumption
and never likely in practice.  AIC weights are based on AIC that are
computed from the log likelihood maximized by all predictors
simultaneously.  There is nothing in the theory behind AIC that suggests
you can attribute the log likelihood equally to all predictor variables in
the model.  I’m not sure why Burnham and Anderson (2002) propagated such a
notion as it totally conflicts with and ignores a large body of statistical
literature on methods for assigning relative importance of predictors
within a given statistical model.   Examples from some accessible
statistical and ecological literature include:

Bring, J.  1994.  How to standardize regression coefficients.  The American
Statistician 48: 209-213.

Chevan, A., and M. Sutherland.  1991.  Hierarchical partitioning.  The
American Statistician 45: 90-96.

Christensen, R.  1992.  Comments on Chevan and Sutherland.  The American
Statistician 46: 74.

Grömping, U.  2007.  Estimators of relative importance in linear regression
based on variance decomposition.  The American Statistician 61: 139-147.

Kruskal, W., and R. Majors.  1989.  Concepts of relative importance in
recent scientific literature.  The American Statistician 43: 2-6.

MacNally, R.  2000.  Regression model-building in conservation biology,
biogeography and ecology:  The distinction between – and reconciliation of
– ‘predictive’ and ‘explanatory’ models.  Biodiversity and 
Conservation 9:
655-671.

Murray, K., and M. M. Conner.  2009.  Methods to quantify variable
importance:  implications for the analysis of noisy ecological data.
Ecology 90:348-355.

The paper by Murray and Conner (2009) is a simulation study that confirms
and states what was obvious to most statisticians – AIC was not designed to
differentiate among contributions of individual predictors within a model
and, thus, is not appropriate for evaluating relative importance of
individual predictors.  Stick to using AIC weights to assess relative
importance of models (note however that if all your models had single
predictors then ranking models would be the same as ranking predictors).

Relative importance is a slippery concept with many interpretations.  But
useful ones for predictors variables within a given regression model
typically are related to contribution to reducing the objective function
used in statistical estimation (minimizing negative log likelihood, sums of
squares, deviance, etc.) and expected change in the response variable given
a unit change in the predictor (these are well discussed in Bring 1994).
In essence, the relative importance of individual predictors would have to
involve the relative importance within a given model, i.e., how much it
contributes to the likelihood that is maximized, or equivalently the
minimization of residual sums of squares or minimization of deviances (−2´ 
log
likelihood) relative to the other predictors in that model.   A couple of
possibilities for determining relative importance of predictors within a
model exist.  The ratio of *t-*statistics (parameter estimate divided by
its standard error) corresponds directly to the ratio of appropriately
standardized variables (Bring 1994).  Note that this standardization of
variables (by their partial standard deviations) is different than what is
often done by default in most statistical software.  The hierarchical
partitioning approach of Chevan and Sutherland (1991) is based on similar
ideas but considers more contrasts of models.  It is available in the
hier.part library in R.  Murray and Connor (2009) simulation study found
that it worked quite well.

So I think what needs to be done is to conduct some reasonable assessment
of relative importance of the predictors within each of the top candidate
models.  Then you can assess those results across all those top models.   A
formal process would compute the relative importance for a predictor (say
variable X5) within each of the models in which it occurs, and the relative
importance of each of the models within which the same predictor (e.g., X5)
occurs (this latter could be done with AIC weights), and then combine that
information somehow (probably with some product function).

All these problems with AIC weights extend to problems with calculating
model averaged regression coefficients.  While it certainly may make sense
to compute model averaged predictions (*y*-hat in linear regression) there
is little reason to think that assigning weights based on the entire model
to estimates for individual predictors and then averaging them provides an
enlightened way to interpret regression parameter estimates (the
coefficients).  While it is true that algebraically one can obtain the
model averaged predictions directly or by combining the model averaged
coefficients, interpreting the coefficients implies that the units
associated with these rates of change are interpretable.  And since the
rates of change implied by the estimate for a predictor are always
conditional on what other predictors are in the model (they are partial
effects), it is strange indeed to pretend like you can interpret averages
of them across multiple models, each of which included many different
predictors, in a comprehensible fashion (Candolo, Davison, and Demetrio.
2003.  A note on model uncertainty in linear regression.  The Statistician
52 (part 2): 165-177).  Blums (et al. 2005.  Individual quality, survival
variation and patterns of phenotypic selection on body condition and timing
of nesting in birds.  Oecologica 143: 365-376)  provide another good
example for why not to model average individual parameter estimates.  In
essence this approach due to Burnham and Anderson (2002) is trying to
provide an interpretation of an individual predictor as if it wasn’t really
conditional (they use the terminology “unconditional”) on the other
predictors in the model, something that is unlikely to ever be true except
in a controlled experimental design.  Strange indeed!  It is very much like
trying to interpret main effects in the presence of an interaction in an
ANOVA:  while it occasionally may not be too misleading, in general it
can’t be a good thing to do.

Averaging the parameter estimates across multiple models also assumes a
continuity of the parameter space that may or may not be true in any given
application.  It is certainly possible, depending on the collinearity among
the predictors, for a given predictor variable to have a strong positive
rate of change when estimated with one set of other predictors and a strong
negative rate of change when estimated with another set of predictors,
i.e., a multimodal parameter space.  And the average of those rates could
then be close to zero, and not representative of any effect that was really
estimated.   For example, consider a weighted average estimate of −0.021
for variable A with a standard error of 0.018 (almost as large as the
estimate) indicating an approximate 95% confidence interval of [−0.057,
0.015].  But we don’t know whether this means there was a fairly dispersed
unimodal distribution of parameter estimates near −0.021 and overlapping
zero or whether there was a bimodal distribution of parameter estimates
with one mode  >0 and the other mode <0.  If the latter case, which is to
be expected often when considering models that include different
combinations of predictors that are correlated with each other differently,
then the averaged value is of low information content at best or completely
misleading at worst.  If you really want to try and understand the
relationships due to effects of the individual predictor variables, then
use of model averaged parameter estimates is a very shaky foundation for
those interpretations.

I realize it may come as a shock to many ecologists to learn that few
statisticians actually support the idea of computing model averaged
coefficients based on AIC weights.  But such is the case.  There is no
statistical theory that I’m aware of that suggests AIC weighted average
parameter estimates are a reasonable way to obtain shrinkage estimates
(what Burnham and Anderson claim their weighted coefficient estimates are
like) to assess relationships across multiple models.  Again, I reiterate,
it may make some sense to use AIC weighted average predictions for
predictive purposes, but not for interpreting individual model
coefficients.  There are other procedures, like the lasso (least absolute
shrinkage and selection operator) that are more reasonable statistical
approaches to obtain shrinkage estimates of parameters and eliminate
needless parameters.


Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  [email protected] <[email protected]>
tel:  970 226-9326



On Thu, May 23, 2013 at 4:21 PM, Galina Kamorina <
[email protected]> wrote:

> Hi,
> I would be very graitful if someone could help me to figure out my problem.
>
>  I used mixed-effects models to analyse my data and AIC approach for model
>  selection. I am studying an effect on Labrador tea on basal diameter of
> spruce in 2 different habitats (wet and dry zones) during 3 years.
> This is one of example of my AIC table:
>
>
>
>
>    Candidate
>    models
>
>
>    K
>
>
>    AICc
>
>
>    Δ AICc
>
>
>    AICc Wt
>
>
>
>
>
>   Zone + Labrador tea + Year
>
>
>   9
>
>
>   -17.75
>
>
>   0.00
>
>
>   0.80
>
>
>
>
>   Zone + Labrador tea + Year + Zone × Labrador tea
>
>
>   10
>
>
>   -14.69
>
>
>   3.06
>
>
>   0.17
>
>
>
>
>   Zone + Labrador tea + Year + Year × Labrador tea
>
>
>   12
>
>
>   -11.21
>
>
>   6.53
>
>
>   0.03
>
>
>
>
>   Zone + Labrador tea
>
>
>   6
>
>
>   71.14
>
>
>   88.88
>
>
>   0.00
>
>
>
>
>   Zone + Labrador tea + Zone × Labrador tea
>
>
>   7
>
>
>   73.85
>
>
>   91.59
>
>
>   0.00
>
>
>
> I interpreted the main effect of zone, Labrador tea and Year. My question
> is should I interpret the interaction term  Zone × Labrador tea  also?
> Normally I interpreted the effect of variables that have been in the models
> with Δ AICc < 4.
> One professor said I should not interpred interaction term if the main
> effect is stronger. But at the same time I saw articles where author
> interpreted the interaction term where Akaike weight was still high.
>
> Thank you in advance.
> Galina
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to