Dear Felipe.

My suggestion would be to code the polymorphic condition as a third state -- so (if your trait is binary) you have three different character states: 0, 0+1, and 1.

Then you would enter your data as follows:

states: [0, 0+1, 1]
definitely in state zero: [1.0, 0.0, 0.0]
definitely in state one: [0.0, 0.0, 1.0]
definitely polymorphic: [0.0, 1.0, 0.0]
totally unknown: [0.33, 0.33, 0.33]

You can likewise encode other types of uncertainty about the condition of the trait. For instance, if you have observed state 0 in a taxon but have relatively little information, then you might want to say that the taxon is *either* in state 0 or 0+1. (The converse would obviously true for state 1.) This could be encoded as follows.

either state 0 or 0+1: [0.5, 0.5, 0.0]
either state 0+1 or 1: [0.0, 0.5, 0.5]

There's some grey area here too. For instance, how many observations are required to conclude that a taxon is "definitely in state zero" or "definitely in state one"? Perhaps you've observed only a few individuals for a species and all are in state 1. Is that taxon definitely monomorphic for the condition, or might it be polymorphic? make.simmap can handle this kind of nuance. For instance, you might decide the following for a particular taxon.

probably state 1, could be polymorphic: [0.0, 0.25, 0.75]

After you have coded your trait this way, you need to set up the model to fit to your data.

To do this you will have to create a design matrix -- the same as is used in ape::ace or geiger::fitDiscrete. This design matrix is passed to make.simmap as the argument model.

The key attribute of your model design matrix is that transitions from 0 to 1 occur through the polymorphic condition 0+1.

For instance, for the trait with levels 0, 0+1, 1, your design matrix might have the form:

        0       0+1     1
0       0       1       0
0+1     2       0       3
1       0       4       0

This is a model in which transitions are allowed to occur 0 <-> 0+1 and 0+1 <-> 1, and all transitions can have different rates.

Finally, after running your stochastic mapping analysis, you can also merge states back together to compute statistics of interest -- such as the total posterior probabilities of a state at any node. For instance, you could merge the state 0 and 0+1 to get the probability of 0 at a node. This can be done using phytools::mergeMappedStates and then running the "simmap" summary methods as normal.

I hope this is of some help in getting you started Felipe.

All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, http://blog.phytools.org

Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 9/4/2021 7:36 PM, Felipe Rossetto wrote:
EXTERNAL SENDER
Hi everyone!
I am setting up a presence/absence matrix for running  stochastic mapping through the make.simmap function of phytools, but containing both taxa with polymorphic character state and with unknown data there. I know it is possible to represent missing data as probabilities instead of presence and absence, but I do not know how to distinguish missing data from polymorphic character codes for the analyses.

 So, would this following codification of missing data and polymorphic state be correct ?

*Taxon_A,*1,0,0 (character absent)
*Taxon_B*,0,1,0 (character present)
*Taxon_C*,0.5,0.5,0.0 (polymorphic character)
*Taxon_D*,0.33,0.33,0.33 (missing data)

Thank you very much in advance!

Felipe Rossetto


_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to