Re: [R-sig-phylo] Model-averaging corHMM models?

James Boyko Fri, 15 Nov 2024 10:15:38 -0800

Ya I agree with Liam’s assessment. This is what we’ve done in our past
empirical applications with hidden states: combine hidden state
probabilities so that we only have the observed states, and then average
across models across those. As Brian says, it’s not really possible to
average hidden state information because of the label switching problem.
Imagine fitting an identical HMM twice to the same data. There is no
difference in the likelihood if you were to switch rate class labels. Eg if
in fit 1: rate class A is fast transition rates and rate class B was slow
rates, now swap. There is no difference in the likelihood if now rate class
A is slow and rate class B is fast. The labels are arbitrary and it means
that we can’t identify which rate classes should be grouped together during
model averaging. Hence why we summarize to observed states before
averaging.


I recently added a “model_list” class to corhmm so it’ll be easy to make
this an exported function. (Can also add Brian’s redundancy detection).
I’ll try to get to it this weekend, but shoot me an email if you need help
coding this sooner.

Best,
James

On Fri, Nov 15, 2024 at 5:50 AM Liam J. Revell <[email protected]> wrote:

> Dear list.
>
> I'm going to make a general pitch for model averaging ancestral state
> reconstruction -- without solving Rafa's problem that this is not yet
> implemented in *corHMM*.
>
> Setting aside, for a moment, hidden-rate models -- imagine a set of
> standard M*k* model for a binary trait. There are (reasonably) four such
> models: an ER model, an ARD model, a 0 → 1 model, and a 1 → 0 model.
>
> Now imagine further an unlikely, but possible, scenario in which all of
> our weight of evidence fell on the 0 → 1 and a 1 → 0 models, but 51%
> landed on the former and 49% on the latter. In the former model our
> marginal reconstruction at the root will be unambiguously 0 (assuming both
> 0 & 1 are observed at the tips). Clearly, however, the chances that the
> root node is 0 or 1 are closer to 50:50 than 100:0 (since under our nearly
> equally well-supported 1 → 0 model the same node is unambiguously 1). The
> same dilemma holds for more realistic scenarios -- indeed, I've seen it in
> empirical data!
>
> Lastly, here's a suggestion on how to do model averaging with hidden
> states: first "merge" hidden levels, and then model-average using Akaike
> weights (or whatever your preferred approach is to model averaging). That
> is, if a node has a marginal scaled likelihood of "0A" of 0.4 and "0B" of
> 0.3 (for observed states 0/1 and hidden levels A/B), then the total
> probability that that node was in observed state "0" is just 0.4 + 0.3 =
> 0.7. This is implemented natively in *phytools::ancr*; however, I don't
> recall if I built it to support hidden states. (For ancestral state
> reconstruction of a* "fitHRM"* object in *phytools*, however, it is
> possible to "hide" hidden states after reconstruction, then
> model-averaging, including across hidden & "non-hidden" state models
> becomes quite easy.)
>
> Hope this is helpful.
>
> Best wishes. Liam
> Liam J. Revell
> Professor of Biology, University of Massachusetts Boston
> Web: http://faculty.umb.edu/liam.revell/
> Book: Phylogenetic Comparative Methods in R
> <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r>
> (*Princeton University Press*, 2022)
>
>
> On 11/14/2024 1:19 PM, O'Meara, Brian C via R-sig-phylo wrote:
>
> CAUTION: EXTERNAL SENDER
>
> Hi, Rafa.
>
> One thing to look at is whether the models are effectively identical; if the 
> 3 state is 2 AICc worse than the 2 state it could be that it’s just adding a 
> single extra parameter with no improvement (i.e, the likelihoods might be the 
> same). In some of our other software we now have code to detect functionally 
> redundant models and remove this redundancy, but I don’t remember whether 
> it’s in corHMM yet.
>
> Assuming the models are somewhat different, much as I like model averaging 
> I’d use the one with lower AICc for ancestral state reconstruction (and yep, 
> glad you brought up all the uncertainties with anc recon). With hisse, with 
> model averaging one can basically model the turnover, net diversification, 
> and other diversification-flavored rates: this edge is 0A in this model, and 
> 0C in that model, but one can still just take the turnover rate from each. 
> It’s partly a benefit from hisse’s relative inattention to the variation in 
> transition rates. But corHMM is all about the transition rates: in one model, 
> we have rates 0A->1A, 0B->1B, while in the other we have 0A->1A, 0B->1B, and 
> 0C->1C: it’s not that the overall rate of 0->1 is the unweighted mean of the 
> 0*->1* rates in each model, as state B might be really rare in one model and 
> common in another. One can think about what happens at equilibrium with the 
> estimated rate matrix, or use the simmap functionality in corHMM to look at 
> empirical frequencies of going from 0->1, but it’s an area where I think we 
> need to think a bit more about how to summarize and present results (for one 
> thing, I think summarizing by wait times (reciprocal of rates) rather than 
> rates can be a better way to summarize, so an ephemeral state with a high 
> outgoing rate that a species is expected to occupy for only a short time does 
> not have a huge impact on the average parameter).
>
> I hope this helps; I’m CCing Jeremy Beaulieu and James Boyko in case they’re 
> not seeing R-sig-phylo posts.
>
> Best,
> Brian
>
> _________________________________________
>
> Brian O’Meara
> He/Him
> Professor, Dept. of Ecology & Evolutionary Biology
> University of Tennessee, Knoxville
>
>
> From: R-sig-phylo <[email protected]> 
> <[email protected]> on behalf of Rafael S Marcondes 
> <[email protected]> <[email protected]>
>
> Date: Thursday, November 14, 2024 at 12:11 PM
> To: r-sig-phylo <[email protected]> <[email protected]>
>
> Subject: [R-sig-phylo] Model-averaging corHMM models?
> Hi all,
>
> I have two corHMM models within 2 AICc units of each other. They differ in
> the number of hidden states (2 versus 3). I'm trying to decide how to
> proceed from here and wondering if model-averaging would be the way to go.
> I'm interested in drawing biological conclusions based on
> the parameter estimates, and in doing ancestral character estimation
> (strictly for visualization purposes only--I'm well aware of the caveats of
> over-interpreting ACE states).
>
> corHMM doesn't seem to have an in-built model averaging utility. But by
> analogy with the one in hisse (modelAveRates), it seems as if it's just a
> matter of averaging the parameter estimates based on model AIC weights. Is
> it really that simple? How would I deal with the param present in one model
> and not the other (related to the additional hidden state)? And then would
> it be appropriate to run ACE using the averaged rate parameters? Or would
> it be more appropriate to perform ACE separately for each model and then
> average the node likelihoods instead?
>
> Thanks,
>
> -Rafa
>
> *--*
> *Rafael S. Marcondes, Ph.D. (he/him)*
> (The R in Rafael is pronounced like the h in "hat")
> *https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364613426%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=sVqB7KbRcrZl1BVDOD7SV%2BhfGZunxII4uaKNqK%2FTE68%3D&reserved=0
>  <https://www.rafaelmarcondes.com/> 
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364653476%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vHeIemksgxArMdqW9je6Mzd%2FaF7m4yd9m8BQMkR45Aw%3D&reserved=0>
>  <https://www.rafaelmarcondes.com/>*
> Faculty Fellow in EEB
> Department of BioSciences
> Rice University
> Houston TX 77005
>
>
> *"Eu quase que nada não sei. Mas desconfio de muita coisa"*
> *"I almost don't know nothing. But I suspect many things"*
>   -João Guimarães Rosa, Brazilian novelist
> (Portuguese original and free English translation by me)
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - 
> [email protected]https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364665693%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=OvGBiOS0%2FRhAazqnbVvhfurbhAOOdcod7HjuqbdE7i4%3D&reserved=0
>  <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> Searchable archive at 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364676685%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=%2B3dCyE08gfQq%2BqHhKlP05hG3zjiByokiHSc2KxQco9w%3D&reserved=0
>  <http://www.mail-archive.com/[email protected]/>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - 
> [email protected]https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364689964%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=8Yk9KwToyBxnfHhzFl2k6MHNvjWY1%2FC3Ppo8GDaENXU%3D&reserved=0
>  <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> Searchable archive at 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364700712%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vkk4mRlMFPgDOZw7yPD%2FiAVxpEtUoB83DtRzh1l4t6Q%3D&reserved=0
>  <http://www.mail-archive.com/[email protected]/>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Re: [R-sig-phylo] Model-averaging corHMM models?

Reply via email to