Re: [R-sig-phylo] asymmetric transitions

Mark Holder Fri, 17 Aug 2012 06:13:41 -0700

Hi all,

I brought up the non-identifiability of the rich forms of the covarion model 
only because that is the source for my intuition that it will be really hard to 
distinguish the 1-binary-character threshold from the covarion. I agree with 
Dan, that the non-identifiability is not causing high type I error rates here.


I think that Mk, ARD, Mk+covarion, and ARD+covarion are all distinguishable 
from each other.

I think that Mk, ARD, 1-binary-character threshold, and a 
"Asymmetric-threshold" (if such a model has ever been described) are all 
distinguishable from each other.

My understanding is that Jarrod and Dan have shown that if you:
        1. simulate under a binary character under the threshold (which has no 
bias 0->1 or 1->0 bias), and then
        2. test Mk vs ARD,
you often strongly prefer ARD (despite the fact that the true model has no 
asymmetry)

I think that this result is correct (not the result of computer glitches) and 
worrying for folks interested in detecting asymmetric transition rates.  

I think testing Mk+covarion vs ARD+covarion in step 2 might lower your the type 
I error rate (but that is a conjecture).

In many ways the message here is similar to Wayne Maddison's warning that if 
you want to know about evolutionary asymmetries in character change, you have 
to consider whether the character states affect diversification rates (his 
observation that eventually led him to work on BiSSE).  Here the message is "if 
you want to know about evolutionary asymmetries in character change, you also 
have to consider patterns of rate heterogeneity across the tree that could 
confound your model testing."


all the best,
Mark






PS: below is my long-winded, loosy-goosey explanation for my intuition that 
going to Mk+covarion vs ARD+covarion would help:


The simulated data will often have large clades that are fixed for a single 
state, and the observed state frequencies can be far from 50:50 because of 
these large expanses of the tree with the same state.

ARD vs Mk
=========

The Mk model has can only explain a strong deviation from 50:50 in the observed 
state frequencies by saying that:
        A. the character evolves slowly so it has not had time to 
"equilibrate", OR
        B. there have been lots of changes, but by coincidence we always end up 
in the same state in that clade.

On trees with small total length, explanation A is very reasonable and you can 
get very weak support for ARD.

If you have trees with lots of leaves and lots of changes overall, explanation 
A becomes untenable, and Mk has to rely on the coincidence explanation 
(explanation B). Which results in a very low likelihood.

The ARD can explain the a strong deviation from 50:50 in the observed state 
frequencies by asymmetric in transition rates. So it's likelihood does not tank 
as quickly as you add large, monomorphic clades to the tree.

Mk+covarion vs ARD
=================
You should be able to prefer "Mk+covarion" on these data sets over "ARD" (or at 
least not be able to avoid a strong preference for ARD; you'd have to use AIC 
or BIC for this "test"). 

The Mk+covarion has access to an explanation that the Mk does not:
        C. the character was stuck in the off state, so you can explain a large 
paraphyletic assemblage with the same state by a single transition to tha state 
and a subsequent cessation of the substitution process.

Mk+covarion predicts no significant deviations from 50:50 in the types of 
transitions (after you use the covarion process to help you get a better 
understanding of the relative opportunity for each type of transition -- the 
proportion of time spent in the "on" hidden state for state 0 and 1 on the 
tree). 

ARD expects to see the same asymmetry in changes the parts of the tree that 
happen to have high rates of change and the parts of the tree that have low 
rates of change (since the ARD assumes that there is a constant rate of change, 
and all apparent difference is rate are sampling error).

The simulated data should show close the unbiased transitions in the fast 
changing parts of the tree, so Mk+covarion should do a better job than ARD.


Mk+covarion vs ARD+covarion
==========================
Both models should do a good job of not getting distracted by large expanses of 
the tree that are fixed (they won't attribute this as strong evidence in favor 
of that character).  If there does appear to be a strong bias in the parts of 
the tree with lots of changes, then ARD+covarion will win. Mk+covarion has 
fewer parameters. So it should win unless there is strong evidence for 
asymmetry in the fast parts of the tree.





On Aug 17, 2012, at 6:29 AM, Dan Rabosky wrote:

> 
> Hi folks-
> 
> I still think there is a difference between (i) parameter identifiability, 
> which may or may not be a problem here, and (ii) strong support for the wrong 
> model, which clearly appears to be occurring here (e.g., Type I error rates > 
> 0.75). I don't think non-identifiability of a parameter implies that you'll 
> get massively inflated Type I error rates for models that include the 
> parameter. 
> 
> Also, the root state (and assumptions regarding it) don't seem to drive the 
> pattern - you can set the root to 0 under the threshold model (e.g., both 
> states equiprobable at root) and you recover the same strong bias - Jarrod's 
> threshold example set the root to -1, you can verify that the problem holds 
> with equiprobable root states). At this point I guess I'd be more concerned 
> about model misspecification and its diagnosis (when modeling discrete 
> characters) than about identifiability per se, though perhaps I am not 
> thinking about this correctly. Distinguishing MK/ARD from threshold may be 
> difficult, but there is clearly signal of these processes in the data: when 
> you fit MK/ARD to data simulated under a threshold model, you get strong 
> support for ARD - but not when you fit those same models to data generated 
> under MK. So - there is clearly something in the data that is sufficiently 
> informative about the processes such that we observe high error rates. 
> 
> Mark, thanks for pointing out the relationship between the threshold model 
> and the single-site covarion model. 
> 
> Cheers,
> ~Dan
> 
> 
> 
> 
> 
> 
> On Aug 17, 2012, at 6:31 AM, Jarrod Hadfield wrote:
> 
>> Hi,
>> 
>> Thanks for the Allman & Rhodes paper, it is very nice. For me at least it 
>> confirms my suspicions, but made me realise that claims of asymmetric 
>> transition rates are only suspicious if you are unprepared to make some 
>> (strong?) assumptions. If anyone disagrees with what I have written below, 
>> then please tell me and I will try again to understand this stuff:
>> 
>> Identifiability is achieved because the pdf for the root state is the 
>> stationary distribution (denoted by sigma in Allman & Rhodes: see example 
>> 1).  This is, I believe, the default in newer versions of Mesquite, although 
>> in older versions the distribution is 0.5/0.5 as in ace.
>> 
>> If the pdf of the root state is defined by an additional parameter, this 
>> leaves a single parameter to describe the rate of transitions, and 
>> asymmetrical transition rates are non-identifiable.  It seems to me there is 
>> a choice to be made between a) assuming the same processes after the root 
>> held before the root and talk about asymmetric transition rates or b) do not 
>> make this assumption and then admit that the rates of transition from 0->1 
>> and 1->0 are not separable. I don't think the data can be used to 
>> distinguish between these view points, and so its a matter of personal 
>> choice which interpretation/model is used.
>> 
>> Cheers,
>> 
>> Jarrod
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Quoting Mark Holder <mthol...@ku.edu> on Thu, 16 Aug 2012 23:41:45 -0500:
>> 
>>> Hi,
>>> 
>>> I agree that model testing between ARD vs MK models is going to be 
>>> misleading when the process is really described by a threshold model (and 
>>> sorry for ignoring that set of simulations by Jarrod previously; somehow I 
>>> misfiled that email and didn't see it).
>>> 
>>> The threshold model has nice ways of dealing with correlations among 
>>> characters.  However, when it is applied as the underlying model for a 
>>> single binary character (as in Jarrod's sims), the threshold model is 
>>> similar to the single-site version of the covarion model (Tuffley and 
>>> Steel's version).
>>> 
>>> I don't think the models are identical, but they are quite similar. I 
>>> suspect that if you generated a data set under one of the models, it would 
>>> be quite hard to determine which was the generating model. Instead of just 
>>> having a an "on" and "off" state (as in the covarion model), the threshold 
>>> model has a continuum (the further the underlying continuous trait is from 
>>> boundary, the more "off" the observable binary trait is).  Allman and 
>>> Rhodes (2009, ref below) proved some results on the identifiability of 
>>> generalizations of covarion processes. They considered models with more 
>>> hidden rate categories (not just rate of zero and an rate of evolution when 
>>> in the "on" state). I believe that their results were that the number of 
>>> hidden rate categories that you can identify cannot exceed the number of 
>>> observable states. So it may be hard to get much richer than the 
>>> Tuffley+Steel covarion when you have a binary character.
>>> 
>>> Which is a long way of saying that, it might be worth looking at the 
>>> covarion model variants for the types of data that Jarrod is interested in. 
>>>  Implementations of the covarion model for two states is quite fast and 
>>> tractable. Testing Mk+covarion vs ARD+covarion may indeed be a more robust 
>>> way of
>>> detecting asymmetry in rates of character transitions compared to Mk vs ARD.
>>> 
>>> 
>>> Thanks for pointing out the Boettiger et al paper, Matt.
>>> 
>>> 
>>> all the best,
>>> Mark
>>> 
>>> 
>>> [1] E. S. Allman and J. A. Rhodes, The Identifiability of Covarion Models 
>>> in Phylogenetics, IEEE/ACM Transactions on Computational Biology and 
>>> Bioinformatics, vol. 6, no. 1, pp. 7688, Jan. 2009.
>>> 
>>> 
>>> 
>>> 
>>> On Aug 16, 2012, at 10:07 PM, Matt Pennell wrote:
>>> 
>>>> correction: the last sentence should have read
>>>> 
>>>> I wonder how that would work in this case. I think these are important 
>>>> questions going forward.
>>>> 
>>>> On Thu, Aug 16, 2012 at 11:00 PM, Matt Pennell <mwpenn...@gmail.com> wrote:
>>>> Hey all,
>>>> 
>>>> This has been a really fantastic discussion. Mark, you make some really 
>>>> excellent points in response to my earlier comments. I think you are 
>>>> correct in this.
>>>> 
>>>> The question that arises out of Jarrod and Dan's simulations (which I have 
>>>> just run) is whether a model selection criteria would be able to 
>>>> distinguish MK from the threshold model that Felsenstein (and Wright 
>>>> before him) put forth? And how do we best assess model adequacy? Carl 
>>>> Boettiger and company (2012: Evolution) suggested a Phylogenetic Monte 
>>>> Carlo approach for continuous characters. I wonder how that would before  
>>>> I think these are important questions going forward.
>>>> 
>>>> thanks again,
>>>> matt
>>>> 
>>>> 
>>>> 
>>>> On Thu, Aug 16, 2012 at 10:43 PM, Dan Rabosky <drabo...@umich.edu> wrote:
>>>> 
>>>> Hi all-
>>>> 
>>>> A couple of points. I am actually less concerned about the Type I error 
>>>> rates I gave in that previous message for the equal rates markov process, 
>>>> even though I think they are real (e.g., I can corroborate them using 
>>>> Diversitree). I don't think it is an issue of ascertainment bias, but I 
>>>> think Mark may be right about the LRT being inappropriate with few events 
>>>> on the tree and this may well explain the matter. This is probably worth 
>>>> exploring further.
>>>> 
>>>> However, I am much more concerned about Jarrod's second model (with 
>>>> underlying continuous latent variable). This seems to be a serious 
>>>> problem, and if you simulate under the latent model, I think he is right 
>>>> that Type I error rates are really, really high. The model is reasonable: 
>>>> there is a continuous trait that influences the probability of observing a 
>>>> particular tip state. In a practical sense, this probably means the 
>>>> following: (i) some clades in the tree will essentially be fixed for the 
>>>> character in question. (ii) Other clades will appear to have high lability 
>>>> of the character. The clades that are fixed will be those clades where the 
>>>> underlying threshold character (e.g., mean clade value) drifts towards 
>>>> -Inf (or +Inf). Regardless of whether we think about this latent model, 
>>>> this at least leads to an interesting - and probably quite relevant - form 
>>>> of model misspecification. The model essentially induces some extra 
>>>> heterogeneity in rates, such that some clades will appear to be switching 
>>>> quickly and others slowly. However, it is still a symmetric model of sorts.
>>>> 
>>>> You can simulate data easily under this model and verify (using whatever 
>>>> software) that it is a problem. I'm attaching code that does this. You can 
>>>> play around with 3 parameters: (i) the number of taxa in the analysis (set 
>>>> to 100); (ii) the expected variance of the continuous latency factor (from 
>>>> roots to tip); and (iii) the root state. These parameters are NTAXA, 
>>>> tipvar, and root in the code below.
>>>> 
>>>> I'm keen to see what others think, but it looks to me like you can 
>>>> simulate very reasonable-looking datasets and obtain extremely strong 
>>>> support for an asymmetric model - even though the model is 
>>>> quasi-symmetric. So, if these hold, then I think this is a serious issue - 
>>>> nothing we routinely do in the analysis of discrete characters is designed 
>>>> to detect this sort of model misspecification.
>>>> 
>>>> ## A single simulation
>>>> 
>>>> library(diversitree);
>>>> library(geiger);
>>>> library(mvtnorm);
>>>> 
>>>> NTAXA <- 101;
>>>> 
>>>> # Generate the tree:
>>>> x <- birthdeath.tree(b=1, d=0, taxa.stop=NTAXA);
>>>> x <- drop.tip(x, x$tip.label[x$edge[,2][x$edge.length==0][1]]);
>>>> 
>>>> 
>>>> vv <- vcv.phylo(x); # get phylogenetic vcv matrix
>>>> 
>>>> # Now we set the expected variance at the tips:
>>>> #       e.g., the value we want for the diagonal of the vcv matrix
>>>> #       If this is = 1, you'll have a "phylogenetic" standard normal 
>>>> distribution
>>>> #
>>>> 
>>>> tipvar <- 2;
>>>> sf <- tipvar/max(vv); #get scale factor for vcv matrix
>>>> 
>>>> vmat <- vv*sf; # scale matrix
>>>> root <- 0; #root state: this assumes the root is equally likely to give 
>>>> either state
>>>> 
>>>> mu <- rep(root, nrow(vmat));  #vector of means
>>>> 
>>>> 
>>>> # Simulate continuous, and then discrete, chars from
>>>> #               the corresponding mvn and binomial distributions
>>>> chars <- rmvnorm(1, mean=mu, sigma=vmat);
>>>> states <- rbinom(length(chars), 1, prob=plogis(chars));
>>>> names(states) <- x$tip.label;
>>>> 
>>>> # Look at the data...
>>>> plot(x, show.tip.label=F);
>>>> tiplabels(pch=21, bg = c('black', 'white')[(states+1)], col='black', 
>>>> cex=1);
>>>> 
>>>> #### Using Diverstree for model fitting:
>>>> lfx <- make.mk2(x, states); # The asymmetric likelihood function
>>>> lfxcon <- constrain(lfx, formulae = list(q01 ~ q10)); #constraining q01 ~ 
>>>> q10
>>>> 
>>>> # Estimation...
>>>> l2 <- find.mle(lfx, runif(2, 0, 5))$lnLik;
>>>> l1 <- find.mle(lfxcon, runif(1, 0, 5))$lnLik;
>>>> 
>>>> #likelihood ratio test
>>>> lrt <- -2*l1 + 2*l2;
>>>> 1 - pchisq(lrt, df=1);
>>>> 
>>>> 
>>>> ### End sim
>>>> 
>>>> Cheers,
>>>> ~Dan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Aug 16, 2012, at 9:22 PM, Mark Holder wrote:
>>>> 
>>>> > Hi all,
>>>> >       <apologies for the long email>
>>>> >
>>>> > I'm a bit more concerned with Dan's elevated Type-1 error rates than 
>>>> > Jarrod's example.
>>>> >
>>>> > With respect to Jarrod's simulations, I have a few thoughts:
>>>> >       1. I don't understand the claim (in the original email) that "its 
>>>> > fairly straightforward to prove that asymmetric transition rates cannot 
>>>> > be identified using data collected on the tips of a phylogeny" It seems 
>>>> > like this is something that is routinely done in phylogenetics, and 
>>>> > proofs of identifiability of GTR exist (demonstrating that  this indeed 
>>>> > feasible and not some computational artifact).
>>>> >
>>>> >       2. I think that Dan's original response is correct. We should not 
>>>> > expect to reject the null only 5% of the time if we simulate a bias in 
>>>> > the states of the tips. Simulating data without respect to a phylogeny 
>>>> > and then analyzing under Mk or "ARD" should be like a simulation in 
>>>> > which the rate of character evolution is essentially infinite.  The Mk 
>>>> > model predicts the states to be equally frequent, and so it is not 
>>>> > problematic for a deviation from 50:50 to favor the ARD model over the 
>>>> > Mk model. In fact, the simulation model is equivalent the ARD model with 
>>>> > a rate of evolution approaching infinity, so we should prefer the ARD 
>>>> > model.
>>>> >
>>>> >       3. (in reference to Matt's post) In the ARD model, it is possible 
>>>> > for the MLE of the equilibrium state frequency of the less frequent 
>>>> > state to be >0.5. Presumably, this is a rare occurrence, but I don't 
>>>> > agree with the characterization of the ratio of parameters converging to 
>>>> > the ratio of states.
>>>> >
>>>> >       Consider a clade with lots of tips in state 1 but tiny branch 
>>>> > lengths. If this clade is found in the context of a tree with a few 
>>>> > other branches that are long and lead to tips with state 0, then you can 
>>>> > get an MLE of the state frequency for state 0 being > 0.5. Most of the 
>>>> > tips will have state 1, but because they are easily explained by one 
>>>> > transition to 1 you can still infer that the less frequent state has a 
>>>> > higher equilibrium frequency.
>>>> >
>>>> >       Perhaps, I'm mis-reading what Matt is referring to when he 
>>>> > discusses an analysis of a tree with "a lot of tips (ie. approaching the 
>>>> > limit)." I do agree that if you simulate a very large tree under ARD 
>>>> > (with the frequencies not equal to 0.5), then the frequency of the 
>>>> > states at the tips will converge to the equilibrium state frequencies.
>>>> >
>>>> >
>>>> >
>>>> > With respect to Dan's results:
>>>> >
>>>> > The Type I error rate of 0.12 troubles me.  Have you tried exporting the 
>>>> > data and seeing if other software agrees with the likelihood ratios 
>>>> > returned by ace() ?  I looked at the code for ace and nothing looked 
>>>> > amiss to me (though my R skills are virtually non-existent).
>>>> >
>>>> > If the result is corroborated by other software, then my best guesses 
>>>> > would be:
>>>> >       1. ascertainment bias (the simulation model clearly excludes 
>>>> > constant patterns, but I don't believe that the inference model in ace 
>>>> > does any correction for this), and/or
>>>> >       2. the assumption that you can use the chi-square as the null 
>>>> > distribution for the LRT probably breaks down when you have very few 
>>>> > events on the tree. In some sense the number of events is our measure of 
>>>> > the amount of data, and when we have very few events on the tree the 
>>>> > asymptotic behavior of the LRT under the null is probably not going to 
>>>> > help us.
>>>> >
>>>> > In the limiting case, when rates of character change are so low that you 
>>>> > never see homoplasy, then I think the LRT of the the two models should 
>>>> > get close to 1 on virtually any realization (conditional on starting in 
>>>> > state 1 and having exactly 1 change on the tree, both model make the 
>>>> > same predictions about the data; so in this realm the data should not 
>>>> > prefer one model over the other).  So, I'm not sure how the small data 
>>>> > explanation would explain your observation of an excess of large LRT 
>>>> > statistics.
>>>> >
>>>> >
>>>> > those are my 2 cents.
>>>> >
>>>> > all the best,
>>>> > Mark
>>>> >
>>>> >
>>>> 
>>>> _____________________
>>>> Dan Rabosky
>>>> Assistant Professor
>>>> Dept of Ecology and Evolutionary Biology
>>>> & Museum of Zoology
>>>> University of Michigan
>>>> drabo...@umich.edu
>>>> 
>>>> 
>>>> 
>>> 
>>> Mark Holder
>>> 
>>> mthol...@ku.edu
>>> http://phylo.bio.ku.edu/mark-holder
>>> 
>>> ==============================================
>>> Department of Ecology and Evolutionary Biology
>>> University of Kansas
>>> 6031 Haworth Hall
>>> 1200 Sunnyside Avenue
>>> Lawrence, Kansas 66045
>>> 
>>> lab phone:  785.864.5789
>>> 
>>> fax (shared): 785.864.5860
>>> ==============================================
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> 
>> 
>> 
>> 
> 
> _____________________
> Dan Rabosky
> Assistant Professor
> Dept of Ecology and Evolutionary Biology
> & Museum of Zoology
> University of Michigan
> drabo...@umich.edu
> 

Mark Holder

mthol...@ku.edu
http://phylo.bio.ku.edu/mark-holder

==============================================
Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045

lab phone:  785.864.5789

fax (shared): 785.864.5860
==============================================









        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] asymmetric transitions

Reply via email to