Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Carl Boettiger
Hi list,

I agree with's Ben's definition of AIC, " expected (Kullback-Leibler)
distance between a hypothetical 'true model' and
any specified model", I just feel that doesn't give any intuition to a
frequentist which is what I thought the question was asking.

Ben, for references to this derivation I like Cavanaugh
1997.
 The paper is actually about AIC vs AICc, but provides a nice clean
derivation which shows that AIC penalty provides an asymptotically unbiased
estimate, whereas the maximum likelihood estimate (MLE) alone is biased by
that amount.  (Consider that the MLE of the order parameter k for a
polynomial is n-1; it's clear the MLE is biased).  I may have missed
something in my interpretation here, so happy to stand corrected.

In my reply I was only hoping to show that there is a natural connection
between AIC and familiar Frequentist concepts, perhaps the information
terminology makes AIC sound more foreign.  As Joe points out, if I'm
willing to make some parametric assumptions, I can get a distribution of
the AIC statistic like any other statistic.  Whether that's consistent with
one's philosophical beliefs is a separate issue.  I personally I find the
"true-model" to be a bit of a straw man issue; models are approximations
and "best" depends on why you're modeling.

-Carl



On Sun, Jun 17, 2012 at 12:33 PM, Joe Felsenstein wrote:

>
> Ben Bolker --
>
> >> Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
> >> as the AIC.  The LRT tells us that the expectation
> >
> >  ... under the null hypothesis where M0 (the simpler model) is true?
>
> Yup.  I'm considering that case to see how the AIC fits in with the LRT.
> Of course the AIC is proposed for a much wider range of cases.
>
> >> of
> >> 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
> >> Chi-Square with that number of degrees of freedom).
> >>
> >> But the AIC tells us that the expectation is  2(d1 - d0).
> >
> >  Maybe I'm missing something, but I don't see how the AIC tells us
> > something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
> > the expectation of the Kullback-Leibler distance, which is something
> > like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
> > outcomes and q(i) is the predicted distribution of outcomes ... so it's
> > something more like a marginal log-likelihood difference rather than a
> > maximum log-likelihood difference ...
>
> Well, the AIC ends up with comparing   -2 log(L) + 2d  for the two
> hypotheses.   The difference of these for models  M1 and M0
> is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I
> missed something here?  So the expectation of the difference
> is log likelihood  *is*  described by the AIC, right?   And isn't it
> (in view of Fisher's distribution) wrong too?   That is what
> disturbs me and makes me feel there is something I don't
> understand about the AIC argument.
>
> Joe
> 
> Joe Felsenstein j...@gs.washington.edu
> Department of Genome Sciences and Department of Biology,
> University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>
>
>
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>



-- 
Carl Boettiger
UC Davis
http://www.carlboettiger.info/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Joe Felsenstein

Ben Bolker --

>> Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
>> as the AIC.  The LRT tells us that the expectation 
> 
>  ... under the null hypothesis where M0 (the simpler model) is true?

Yup.  I'm considering that case to see how the AIC fits in with the LRT.
Of course the AIC is proposed for a much wider range of cases.

>> of   
>> 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
>> Chi-Square with that number of degrees of freedom).
>> 
>> But the AIC tells us that the expectation is  2(d1 - d0).
> 
>  Maybe I'm missing something, but I don't see how the AIC tells us
> something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
> the expectation of the Kullback-Leibler distance, which is something
> like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
> outcomes and q(i) is the predicted distribution of outcomes ... so it's
> something more like a marginal log-likelihood difference rather than a
> maximum log-likelihood difference ...

Well, the AIC ends up with comparing   -2 log(L) + 2d  for the two
hypotheses.   The difference of these for models  M1 and M0
is just (the negative of) 2 log(L1/L0) - 2(d1-d0).Or have I
missed something here?  So the expectation of the difference
is log likelihood  *is*  described by the AIC, right?   And isn't it
(in view of Fisher's distribution) wrong too?   That is what
disturbs me and makes me feel there is something I don't
understand about the AIC argument.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Ben Bolker
On 12-06-17 07:35 AM, Joe Felsenstein wrote:
> 
> Ben Bolker wrote:
> 
>>  I'd like to chime in here ... again ready to be corrected by others.
>> The description above doesn't match my understanding of AIC at all
>> (although again I will be perfectly happy, and in fact really
>> interested, if you can point me to a reference that lays out this
>> justification for the AIC).  AIC is an estimate of the expected
>> (Kullback-Leibler) distance between a hypothetical 'true model' and
>> any specified model. 
> 
> Thanks, that is very clear.  It is the correct description.  But that
> leaves me unsatisfied.  Consider, for example, the case where
> we have two models, M0 and M1.   Suppose that they are in
> fact nested, with degrees of freedom  d0  and  d1.
> 
> Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
> as the AIC.  The LRT tells us that the expectation 

  ... under the null hypothesis where M0 (the simpler model) is true?

> of   
> 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
> Chi-Square with that number of degrees of freedom).
> 
> But the AIC tells us that the expectation is  2(d1 - d0).

  Maybe I'm missing something, but I don't see how the AIC tells us
something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
the expectation of the Kullback-Leibler distance, which is something
like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
outcomes and q(i) is the predicted distribution of outcomes ... so it's
something more like a marginal log-likelihood difference rather than a
maximum log-likelihood difference ...
> 
> This is, I gather, not a conflict because the assumption of the
> AIC is that neither M0 nor M1 is correct, but instead a model
> M' which has an infinite number of degrees of freedom is
> correct.   So both assertions are (formally) correct. 
> 
> But what if  M0  was actually correct?  Are we supposed
> to use AIC?   

  I would say that if you want to find the _correct_ model, and you
think that the correct model might be in the set of your candidate
models, you ought to be using BIC/SIC. Along with Burnham and Anderson,
I think that in ecological and evolutionary analysis it is very unlikely
that the true model is *ever* in your set of candidate models (note that
I do disagree with them on a lot of things!) ...
> 
> I also understand that AIC does not give us a distribution
> of the test statistic, LRT does.   For example, in the case of
> phylogenies that all have the same number of degrees of
> freedom, all AIC does is tell us to prefer the one with
> highest likelihood.

  Yes.  People have come up with various (often misused) rules of thumb
about how big an AIC difference is "large" (please don't say
"significant"), but it really boils down to understanding the
log-likelihood scale in a basic (non-probabilistic) way as promoted e.g.
by some of the pure "likelihoodists" (Edwards, Royall) -- a 2-point
difference on the log-likelihood scale corresponds to an eightfold
difference in log-likelihood, so would be reasonable choice for a cutoff
between "small" and "large" likelihood differences.  People often
sometimes reason that adding a single, useless parameter adds +2 to the
AIC, so a difference of <2 is equivalent to less than 1 effective
parameters'-worth of difference (hence "small").
  (I think lots of people here, including you, already know this ...)

  If you want distributions of test statistics, I claim it makes the
most sense to work in a likelihood-ratio-test-based framework.  (I can
imagine that it would be possible to derive asymptotic distributions for
AICs, but I've never seen it done ...)

> Anyway, thanks for the clarification, which makes clear
> that the rationale of using the LRT to justify the AIC
> is incorrect.
> 
> Joe
> 
> Joe Felsenstein j...@gs.washington.edu 
> Department of Genome Sciences and Department of Biology,
> University of Washington, Box 355065, Seattle, WA 98195-5065 USA
> 
> 
>

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo