Luke Harmon wrote:

Yes Joe is correct, there is more to this problem than meets the eye. My implementation assumes equal probability of each unknown state, which is quite different from modeling an actual polymorphic character. I'm sure that doing something different might matter in many cases.


Assuming equal probability of each possible state might be thought of as a model of ambiguity of state, not polymorphism. But even for that it is not a complete likelihood treatment. In likelihood machinery, one uses conditional likelihoods, which give a likelihood of 1 to each possible state. This is not as crazy as it sounds (see pages 255-256 of my book). It is simply that what we have in the conditional likelihoods is NOT the probability of the state, but the probability of the ambiguous observation given the state. So, for example, if we see a purine but don't know whether it is A or G (in a DNA sequence case), the probability of seeing purine, given that we only can see purineness or pyrimidineness, and the state really is A, is 1, and similarly if it is really G. So the conditional likelihoods for the four nucleotides are (1,0,1,0). Sounds wrong but it isn't.

Polymorphism is totally different: you have actually seen both states.

For discrete 0/1 characters, one can use Sewall Wright's (1934) threshold model which I have discussed (briefly in the book and more extensively in a 2005 paper in the Philosophical Transactions of the Royal Society B). I have a paper under revision at a major journal about it and will release my program Threshml soon in a pre-PHYLIP version. Unlike Mark Pagel and Paul Lewis's Mk model, it predicts polymorphism in a natural way. The population has an underlying unobservable quantitative character, the "liability", that implies some frequency of both 0 and 1 states. I think Ted Garland and others also use a log-linear model that has somewhat similar properties but is not exactly the same.

To get these models to deal with multiple character states is possible but very very nontrivial. If you see states 0, 1, 2, is 1 intermediate between 0 and 2, or is it off at right angles to both? There are possible threshold models that could do either -- telling the difference between them requires lots of data. With, say, 6 states it would be a nightmare.

Joe
----
Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to