Dear Felipe.
My suggestion would be to code the polymorphic condition as a third
state -- so (if your trait is binary) you have three different character
states: 0, 0+1, and 1.
Then you would enter your data as follows:
states: [0, 0+1, 1]
definitely in state zero: [1.0, 0.0, 0.0]
definitely in state one: [0.0, 0.0, 1.0]
definitely polymorphic: [0.0, 1.0, 0.0]
totally unknown: [0.33, 0.33, 0.33]
You can likewise encode other types of uncertainty about the condition
of the trait. For instance, if you have observed state 0 in a taxon but
have relatively little information, then you might want to say that the
taxon is *either* in state 0 or 0+1. (The converse would obviously true
for state 1.) This could be encoded as follows.
either state 0 or 0+1: [0.5, 0.5, 0.0]
either state 0+1 or 1: [0.0, 0.5, 0.5]
There's some grey area here too. For instance, how many observations are
required to conclude that a taxon is "definitely in state zero" or
"definitely in state one"? Perhaps you've observed only a few
individuals for a species and all are in state 1. Is that taxon
definitely monomorphic for the condition, or might it be polymorphic?
make.simmap can handle this kind of nuance. For instance, you might
decide the following for a particular taxon.
probably state 1, could be polymorphic: [0.0, 0.25, 0.75]
After you have coded your trait this way, you need to set up the model
to fit to your data.
To do this you will have to create a design matrix -- the same as is
used in ape::ace or geiger::fitDiscrete. This design matrix is passed to
make.simmap as the argument model.
The key attribute of your model design matrix is that transitions from 0
to 1 occur through the polymorphic condition 0+1.
For instance, for the trait with levels 0, 0+1, 1, your design matrix
might have the form:
0 0+1 1
0 0 1 0
0+1 2 0 3
1 0 4 0
This is a model in which transitions are allowed to occur 0 <-> 0+1 and
0+1 <-> 1, and all transitions can have different rates.
Finally, after running your stochastic mapping analysis, you can also
merge states back together to compute statistics of interest -- such as
the total posterior probabilities of a state at any node. For instance,
you could merge the state 0 and 0+1 to get the probability of 0 at a
node. This can be done using phytools::mergeMappedStates and then
running the "simmap" summary methods as normal.
I hope this is of some help in getting you started Felipe.
All the best, Liam
Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]
Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org,
http://blog.phytools.org
Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile
U.S. COVID-19 explorer web application:
https://covid19-explorer.org/
On 9/4/2021 7:36 PM, Felipe Rossetto wrote:
EXTERNAL SENDER
Hi everyone!
I am setting up a presence/absence matrix for running stochastic
mapping through the make.simmap function of phytools, but containing
both taxa with polymorphic character state and with unknown data there.
I know it is possible to represent missing data as probabilities
instead of presence and absence, but I do not know how to distinguish
missing data from polymorphic character codes for the analyses.
So, would this following codification of missing data and polymorphic
state be correct ?
*Taxon_A,*1,0,0 (character absent)
*Taxon_B*,0,1,0 (character present)
*Taxon_C*,0.5,0.5,0.0 (polymorphic character)
*Taxon_D*,0.33,0.33,0.33 (missing data)
Thank you very much in advance!
Felipe Rossetto
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/