Thanks, everyone! This is all very helpful! I think I understand Liam's suggestion and I think that'll be the way to go for me. I kind of wanted to be able to make a figure depicting the hidden regimes across the phylogeny, because I think that's interesting at least as a heuristic to help us think about what might be going on biologically. But I also understand why that might not actually make sense.
Thanks, -Rafa *--* *Rafael S. Marcondes, Ph.D. (he/him)* (The R in Rafael is pronounced like the h in "hat") *https://www.rafaelmarcondes.com/ <https://www.rafaelmarcondes.com/>* Faculty Fellow in EEB Department of BioSciences Rice University Houston TX 77005 *"Eu quase que nada não sei. Mas desconfio de muita coisa"* *"I almost don't know nothing. But I suspect many things"* -João Guimarães Rosa, Brazilian novelist (Portuguese original and free English translation by me) On Fri, Nov 15, 2024 at 10:01 AM O'Meara, Brian C <bome...@utk.edu> wrote: > 2024 has been weird – but in one of the happier examples of that, I find > myself agreeing with Liam AND James. 😉 > > > > I’d been implicitly thinking about model averaging the rates first, *then* > using this for ancestral state recon: what’s the average rate for 0A->0B, > given (most generally) some models have no hidden state (so no 0A->0B > possible), some have just A and B, some have A, B, C, and D (with no > guarantee that 0B in one model corresponds to 0B in another). But doing the > recon in each model *first*, then collapsing the hidden states so we just > look at likelihood of 0 or 1 at a node and weight these across the model > weights, does indeed make a lot of sense and is a much better way (which I > should have remembered, as this is how it works in hisse plotting already). > I think it still leaves difficulties if one wants to estimate probability > of hidden state “B” at a node across models since that state isn’t the same > “thing” across models of different complexity unlike the observed state, > but fortunately few should have this use case. > > > > Best, > > Brian > > > > _________________________________________ > > > > Brian O’Meara > > He/Him > > Professor, Dept. of Ecology & Evolutionary Biology > > University of Tennessee, Knoxville > > > > > > *From: *James Boyko <jbo...@umich.edu> > *Date: *Friday, November 15, 2024 at 7:37 AM > *To: *liam.rev...@umb.edu <liam.rev...@umb.edu> > *Cc: *O'Meara, Brian C <bome...@utk.edu>, Rafael S Marcondes < > raf.marcon...@gmail.com>, r-sig-phylo <r-sig-phylo@r-project.org>, Jeremy > Michael Beaulieu <jmbea...@uark.edu> > *Subject: *Re: [R-sig-phylo] Model-averaging corHMM models? > > Ya I agree with Liam’s assessment. This is what we’ve done in our past > empirical applications with hidden states: combine hidden state > probabilities so that we only have the observed states, and then average > across models across those. As Brian says, it’s not really possible to > average hidden state information because of the label switching problem. > Imagine fitting an identical HMM twice to the same data. There is no > difference in the likelihood if you were to switch rate class labels. Eg if > in fit 1: rate class A is fast transition rates and rate class B was slow > rates, now swap. There is no difference in the likelihood if now rate class > A is slow and rate class B is fast. The labels are arbitrary and it means > that we can’t identify which rate classes should be grouped together during > model averaging. Hence why we summarize to observed states before > averaging. > > > > I recently added a “model_list” class to corhmm so it’ll be easy to make > this an exported function. (Can also add Brian’s redundancy detection). > I’ll try to get to it this weekend, but shoot me an email if you need help > coding this sooner. > > > > Best, > > James > > > > On Fri, Nov 15, 2024 at 5:50 AM Liam J. Revell <liam.rev...@umb.edu> > wrote: > > Dear list. > > I'm going to make a general pitch for model averaging ancestral state > reconstruction -- without solving Rafa's problem that this is not yet > implemented in *corHMM*. > > Setting aside, for a moment, hidden-rate models -- imagine a set of > standard M*k* model for a binary trait. There are (reasonably) four such > models: an ER model, an ARD model, a 0 → 1 model, and a 1 → 0 model. > > Now imagine further an unlikely, but possible, scenario in which all of > our weight of evidence fell on the 0 → 1 and a 1 → 0 models, but 51% > landed on the former and 49% on the latter. In the former model our > marginal reconstruction at the root will be unambiguously 0 (assuming both > 0 & 1 are observed at the tips). Clearly, however, the chances that the > root node is 0 or 1 are closer to 50:50 than 100:0 (since under our nearly > equally well-supported 1 → 0 model the same node is unambiguously 1). The > same dilemma holds for more realistic scenarios -- indeed, I've seen it in > empirical data! > > Lastly, here's a suggestion on how to do model averaging with hidden > states: first "merge" hidden levels, and then model-average using Akaike > weights (or whatever your preferred approach is to model averaging). That > is, if a node has a marginal scaled likelihood of "0A" of 0.4 and "0B" of > 0.3 (for observed states 0/1 and hidden levels A/B), then the total > probability that that node was in observed state "0" is just 0.4 + 0.3 = > 0.7. This is implemented natively in *phytools::ancr*; however, I don't > recall if I built it to support hidden states. (For ancestral state > reconstruction of a* "fitHRM"* object in *phytools*, however, it is > possible to "hide" hidden states after reconstruction, then > model-averaging, including across hidden & "non-hidden" state models > becomes quite easy.) > > Hope this is helpful. > > Best wishes. Liam > > Liam J. Revell > Professor of Biology, University of Massachusetts Boston > Web: http://faculty.umb.edu/liam.revell/ > Book: Phylogenetic Comparative Methods in R > <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r> > (*Princeton University Press*, 2022) > > On 11/14/2024 1:19 PM, O'Meara, Brian C via R-sig-phylo wrote: > > CAUTION: EXTERNAL SENDER > > > > Hi, Rafa. > > > > One thing to look at is whether the models are effectively identical; if the > 3 state is 2 AICc worse than the 2 state it could be that it’s just adding a > single extra parameter with no improvement (i.e, the likelihoods might be the > same). In some of our other software we now have code to detect functionally > redundant models and remove this redundancy, but I don’t remember whether > it’s in corHMM yet. > > > > Assuming the models are somewhat different, much as I like model averaging > I’d use the one with lower AICc for ancestral state reconstruction (and yep, > glad you brought up all the uncertainties with anc recon). With hisse, with > model averaging one can basically model the turnover, net diversification, > and other diversification-flavored rates: this edge is 0A in this model, and > 0C in that model, but one can still just take the turnover rate from each. > It’s partly a benefit from hisse’s relative inattention to the variation in > transition rates. But corHMM is all about the transition rates: in one model, > we have rates 0A->1A, 0B->1B, while in the other we have 0A->1A, 0B->1B, and > 0C->1C: it’s not that the overall rate of 0->1 is the unweighted mean of the > 0*->1* rates in each model, as state B might be really rare in one model and > common in another. One can think about what happens at equilibrium with the > estimated rate matrix, or use the simmap functionality in corHMM to look at > empirical frequencies of going from 0->1, but it’s an area where I think we > need to think a bit more about how to summarize and present results (for one > thing, I think summarizing by wait times (reciprocal of rates) rather than > rates can be a better way to summarize, so an ephemeral state with a high > outgoing rate that a species is expected to occupy for only a short time does > not have a huge impact on the average parameter). > > > > I hope this helps; I’m CCing Jeremy Beaulieu and James Boyko in case they’re > not seeing R-sig-phylo posts. > > > > Best, > > Brian > > > > _________________________________________ > > > > Brian O’Meara > > He/Him > > Professor, Dept. of Ecology & Evolutionary Biology > > University of Tennessee, Knoxville > > > > > > From: R-sig-phylo <r-sig-phylo-boun...@r-project.org> > <r-sig-phylo-boun...@r-project.org> on behalf of Rafael S Marcondes > <raf.marcon...@gmail.com> <raf.marcon...@gmail.com> > > Date: Thursday, November 14, 2024 at 12:11 PM > > To: r-sig-phylo <r-sig-phylo@r-project.org> <r-sig-phylo@r-project.org> > > Subject: [R-sig-phylo] Model-averaging corHMM models? > > Hi all, > > > > I have two corHMM models within 2 AICc units of each other. They differ in > > the number of hidden states (2 versus 3). I'm trying to decide how to > > proceed from here and wondering if model-averaging would be the way to go. > > I'm interested in drawing biological conclusions based on > > the parameter estimates, and in doing ancestral character estimation > > (strictly for visualization purposes only--I'm well aware of the caveats of > > over-interpreting ACE states). > > > > corHMM doesn't seem to have an in-built model averaging utility. But by > > analogy with the one in hisse (modelAveRates), it seems as if it's just a > > matter of averaging the parameter estimates based on model AIC weights. Is > > it really that simple? How would I deal with the param present in one model > > and not the other (related to the additional hidden state)? And then would > > it be appropriate to run ACE using the averaged rate parameters? Or would > > it be more appropriate to perform ACE separately for each model and then > > average the node likelihoods instead? > > > > Thanks, > > > > -Rafa > > > > *--* > > *Rafael S. Marcondes, Ph.D. (he/him)* > > (The R in Rafael is pronounced like the h in "hat") > > *https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364613426%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=sVqB7KbRcrZl1BVDOD7SV%2BhfGZunxII4uaKNqK%2FTE68%3D&reserved=0 > <https://www.rafaelmarcondes.com/> > <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364653476%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vHeIemksgxArMdqW9je6Mzd%2FaF7m4yd9m8BQMkR45Aw%3D&reserved=0> > <https://www.rafaelmarcondes.com/>* > > Faculty Fellow in EEB > > Department of BioSciences > > Rice University > > Houston TX 77005 > > > > > > *"Eu quase que nada não sei. Mas desconfio de muita coisa"* > > *"I almost don't know nothing. But I suspect many things"* > > -João Guimarães Rosa, Brazilian novelist > > (Portuguese original and free English translation by me) > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364665693%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=OvGBiOS0%2FRhAazqnbVvhfurbhAOOdcod7HjuqbdE7i4%3D&reserved=0 > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > > Searchable archive at > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364676685%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=%2B3dCyE08gfQq%2BqHhKlP05hG3zjiByokiHSc2KxQco9w%3D&reserved=0 > <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364689964%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=8Yk9KwToyBxnfHhzFl2k6MHNvjWY1%2FC3Ppo8GDaENXU%3D&reserved=0 > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > > Searchable archive at > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364700712%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vkk4mRlMFPgDOZw7yPD%2FiAVxpEtUoB83DtRzh1l4t6Q%3D&reserved=0 > <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/