Re: [R-sig-phylo] Model-averaging corHMM models?

Rafael S Marcondes Fri, 15 Nov 2024 09:23:14 -0800

Thanks, everyone! This is all very helpful!

I think I understand Liam's suggestion and I think that'll be the way to go
for me. I kind of wanted to be able to make a figure depicting the hidden
regimes across the phylogeny, because I think that's interesting at least
as a heuristic to help us think about what might be going on biologically.
But I also understand why that might not actually make sense.


Thanks,

-Rafa

*--*
*Rafael S. Marcondes, Ph.D. (he/him)*
(The R in Rafael is pronounced like the h in "hat")
*https://www.rafaelmarcondes.com/ <https://www.rafaelmarcondes.com/>*
Faculty Fellow in EEB
Department of BioSciences
Rice University
Houston TX 77005


*"Eu quase que nada não sei. Mas desconfio de muita coisa"*
*"I almost don't know nothing. But I suspect many things"*
  -João Guimarães Rosa, Brazilian novelist
(Portuguese original and free English translation by me)


On Fri, Nov 15, 2024 at 10:01 AM O'Meara, Brian C <bome...@utk.edu> wrote:

> 2024 has been weird – but in one of the happier examples of that, I find
> myself agreeing with Liam AND James. 😉
>
>
>
> I’d been implicitly thinking about model averaging the rates first, *then*
> using this for ancestral state recon: what’s the average rate for 0A->0B,
> given (most generally) some models have no hidden state (so no 0A->0B
> possible), some have just A and B, some have A, B, C, and D (with no
> guarantee that 0B in one model corresponds to 0B in another). But doing the
> recon in each model *first*, then collapsing the hidden states so we just
> look at likelihood of 0 or 1 at a node and weight these across the model
> weights, does indeed make a lot of sense and is a much better way (which I
> should have remembered, as this is how it works in hisse plotting already).
> I think it still leaves difficulties if one wants to estimate probability
> of hidden state “B” at a node across models since that state isn’t the same
> “thing” across models of different complexity unlike the observed state,
> but fortunately few should have this use case.
>
>
>
> Best,
>
> Brian
>
>
>
> _________________________________________
>
>
>
> Brian O’Meara
>
> He/Him
>
> Professor, Dept. of Ecology & Evolutionary Biology
>
> University of Tennessee, Knoxville
>
>
>
>
>
> *From: *James Boyko <jbo...@umich.edu>
> *Date: *Friday, November 15, 2024 at 7:37 AM
> *To: *liam.rev...@umb.edu <liam.rev...@umb.edu>
> *Cc: *O'Meara, Brian C <bome...@utk.edu>, Rafael S Marcondes <
> raf.marcon...@gmail.com>, r-sig-phylo <r-sig-phylo@r-project.org>, Jeremy
> Michael Beaulieu <jmbea...@uark.edu>
> *Subject: *Re: [R-sig-phylo] Model-averaging corHMM models?
>
> Ya I agree with Liam’s assessment. This is what we’ve done in our past
> empirical applications with hidden states: combine hidden state
> probabilities so that we only have the observed states, and then average
> across models across those. As Brian says, it’s not really possible to
> average hidden state information because of the label switching problem.
> Imagine fitting an identical HMM twice to the same data. There is no
> difference in the likelihood if you were to switch rate class labels. Eg if
> in fit 1: rate class A is fast transition rates and rate class B was slow
> rates, now swap. There is no difference in the likelihood if now rate class
> A is slow and rate class B is fast. The labels are arbitrary and it means
> that we can’t identify which rate classes should be grouped together during
> model averaging. Hence why we summarize to observed states before
> averaging.
>
>
>
> I recently added a “model_list” class to corhmm so it’ll be easy to make
> this an exported function. (Can also add Brian’s redundancy detection).
> I’ll try to get to it this weekend, but shoot me an email if you need help
> coding this sooner.
>
>
>
> Best,
>
> James
>
>
>
> On Fri, Nov 15, 2024 at 5:50 AM Liam J. Revell <liam.rev...@umb.edu>
> wrote:
>
> Dear list.
>
> I'm going to make a general pitch for model averaging ancestral state
> reconstruction -- without solving Rafa's problem that this is not yet
> implemented in *corHMM*.
>
> Setting aside, for a moment, hidden-rate models -- imagine a set of
> standard M*k* model for a binary trait. There are (reasonably) four such
> models: an ER model, an ARD model, a 0 → 1 model, and a 1 → 0 model.
>
> Now imagine further an unlikely, but possible, scenario in which all of
> our weight of evidence fell on the 0 → 1 and a 1 → 0 models, but 51%
> landed on the former and 49% on the latter. In the former model our
> marginal reconstruction at the root will be unambiguously 0 (assuming both
> 0 & 1 are observed at the tips). Clearly, however, the chances that the
> root node is 0 or 1 are closer to 50:50 than 100:0 (since under our nearly
> equally well-supported 1 → 0 model the same node is unambiguously 1). The
> same dilemma holds for more realistic scenarios -- indeed, I've seen it in
> empirical data!
>
> Lastly, here's a suggestion on how to do model averaging with hidden
> states: first "merge" hidden levels, and then model-average using Akaike
> weights (or whatever your preferred approach is to model averaging). That
> is, if a node has a marginal scaled likelihood of "0A" of 0.4 and "0B" of
> 0.3 (for observed states 0/1 and hidden levels A/B), then the total
> probability that that node was in observed state "0" is just 0.4 + 0.3 =
> 0.7. This is implemented natively in *phytools::ancr*; however, I don't
> recall if I built it to support hidden states. (For ancestral state
> reconstruction of a* "fitHRM"* object in *phytools*, however, it is
> possible to "hide" hidden states after reconstruction, then
> model-averaging, including across hidden & "non-hidden" state models
> becomes quite easy.)
>
> Hope this is helpful.
>
> Best wishes. Liam
>
> Liam J. Revell
> Professor of Biology, University of Massachusetts Boston
> Web: http://faculty.umb.edu/liam.revell/
> Book: Phylogenetic Comparative Methods in R
> <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r>
> (*Princeton University Press*, 2022)
>
> On 11/14/2024 1:19 PM, O'Meara, Brian C via R-sig-phylo wrote:
>
> CAUTION: EXTERNAL SENDER
>
>
>
> Hi, Rafa.
>
>
>
> One thing to look at is whether the models are effectively identical; if the 
> 3 state is 2 AICc worse than the 2 state it could be that it’s just adding a 
> single extra parameter with no improvement (i.e, the likelihoods might be the 
> same). In some of our other software we now have code to detect functionally 
> redundant models and remove this redundancy, but I don’t remember whether 
> it’s in corHMM yet.
>
>
>
> Assuming the models are somewhat different, much as I like model averaging 
> I’d use the one with lower AICc for ancestral state reconstruction (and yep, 
> glad you brought up all the uncertainties with anc recon). With hisse, with 
> model averaging one can basically model the turnover, net diversification, 
> and other diversification-flavored rates: this edge is 0A in this model, and 
> 0C in that model, but one can still just take the turnover rate from each. 
> It’s partly a benefit from hisse’s relative inattention to the variation in 
> transition rates. But corHMM is all about the transition rates: in one model, 
> we have rates 0A->1A, 0B->1B, while in the other we have 0A->1A, 0B->1B, and 
> 0C->1C: it’s not that the overall rate of 0->1 is the unweighted mean of the 
> 0*->1* rates in each model, as state B might be really rare in one model and 
> common in another. One can think about what happens at equilibrium with the 
> estimated rate matrix, or use the simmap functionality in corHMM to look at 
> empirical frequencies of going from 0->1, but it’s an area where I think we 
> need to think a bit more about how to summarize and present results (for one 
> thing, I think summarizing by wait times (reciprocal of rates) rather than 
> rates can be a better way to summarize, so an ephemeral state with a high 
> outgoing rate that a species is expected to occupy for only a short time does 
> not have a huge impact on the average parameter).
>
>
>
> I hope this helps; I’m CCing Jeremy Beaulieu and James Boyko in case they’re 
> not seeing R-sig-phylo posts.
>
>
>
> Best,
>
> Brian
>
>
>
> _________________________________________
>
>
>
> Brian O’Meara
>
> He/Him
>
> Professor, Dept. of Ecology & Evolutionary Biology
>
> University of Tennessee, Knoxville
>
>
>
>
>
> From: R-sig-phylo <r-sig-phylo-boun...@r-project.org> 
> <r-sig-phylo-boun...@r-project.org> on behalf of Rafael S Marcondes 
> <raf.marcon...@gmail.com> <raf.marcon...@gmail.com>
>
> Date: Thursday, November 14, 2024 at 12:11 PM
>
> To: r-sig-phylo <r-sig-phylo@r-project.org> <r-sig-phylo@r-project.org>
>
> Subject: [R-sig-phylo] Model-averaging corHMM models?
>
> Hi all,
>
>
>
> I have two corHMM models within 2 AICc units of each other. They differ in
>
> the number of hidden states (2 versus 3). I'm trying to decide how to
>
> proceed from here and wondering if model-averaging would be the way to go.
>
> I'm interested in drawing biological conclusions based on
>
> the parameter estimates, and in doing ancestral character estimation
>
> (strictly for visualization purposes only--I'm well aware of the caveats of
>
> over-interpreting ACE states).
>
>
>
> corHMM doesn't seem to have an in-built model averaging utility. But by
>
> analogy with the one in hisse (modelAveRates), it seems as if it's just a
>
> matter of averaging the parameter estimates based on model AIC weights. Is
>
> it really that simple? How would I deal with the param present in one model
>
> and not the other (related to the additional hidden state)? And then would
>
> it be appropriate to run ACE using the averaged rate parameters? Or would
>
> it be more appropriate to perform ACE separately for each model and then
>
> average the node likelihoods instead?
>
>
>
> Thanks,
>
>
>
> -Rafa
>
>
>
> *--*
>
> *Rafael S. Marcondes, Ph.D. (he/him)*
>
> (The R in Rafael is pronounced like the h in "hat")
>
> *https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364613426%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=sVqB7KbRcrZl1BVDOD7SV%2BhfGZunxII4uaKNqK%2FTE68%3D&reserved=0
>  <https://www.rafaelmarcondes.com/> 
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364653476%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vHeIemksgxArMdqW9je6Mzd%2FaF7m4yd9m8BQMkR45Aw%3D&reserved=0>
>  <https://www.rafaelmarcondes.com/>*
>
> Faculty Fellow in EEB
>
> Department of BioSciences
>
> Rice University
>
> Houston TX 77005
>
>
>
>
>
> *"Eu quase que nada não sei. Mas desconfio de muita coisa"*
>
> *"I almost don't know nothing. But I suspect many things"*
>
>   -João Guimarães Rosa, Brazilian novelist
>
> (Portuguese original and free English translation by me)
>
>
>
>         [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
>
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364665693%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=OvGBiOS0%2FRhAazqnbVvhfurbhAOOdcod7HjuqbdE7i4%3D&reserved=0
>  <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>
> Searchable archive at 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364676685%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=%2B3dCyE08gfQq%2BqHhKlP05hG3zjiByokiHSc2KxQco9w%3D&reserved=0
>  <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
>
>
>
>         [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
>
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364689964%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=8Yk9KwToyBxnfHhzFl2k6MHNvjWY1%2FC3Ppo8GDaENXU%3D&reserved=0
>  <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>
> Searchable archive at 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364700712%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vkk4mRlMFPgDOZw7yPD%2FiAVxpEtUoB83DtRzh1l4t6Q%3D&reserved=0
>  <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Model-averaging corHMM models?

Reply via email to