Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-11 Thread Emmanuel Paradis

Charles,

Package diversitree has a way to include this kind of information. From 
the help page ?make.bisse:


Unresolved clade information:

 This must be a ‘data.frame’ with at least the four columns

• ‘tip.label’, giving the name of the tip to which the data
  applies

• ‘Nc’, giving the number of species in the clade

• ‘n0’, ‘n1’, giving the number of species known to be in state
  0 and 1, respectively.

In your case you might specify a model where you want to estimate only 
the transition rates but not the speciation/extinction parameters (I'm 
not sure how easy this is). I don't know if Richard FitzJohn is on the 
list: he may give you more insights.


To answer your question in a previous mail: ace() does not consider such 
polymorphism (neither uncertainty).


Charles Willis wrote on 08/04/2011 20:11:
What the polymophism represents (ambiguity vs. an actual polymophism) is 
key.


In my case, I am trying to code an actual polymorphism -- that is, a 
given taxon exhibits multiple states of a given trait. However, my 
taxonomic level is at the family, so these aren't polymorphisms in the 
population sense, but rather different species with different traits.   

 From a purely practical stand point, does it seem reasonable to re-draw 
the taxon (family) as a polytomy with short branch-lengths and have each 
new tip exhibit different character state? That is, I would be 
representing a given family as a polytomy with as many taxa as 
polymorphic states.


I wouldn't do that.

In species with significant population structure correlated with a given 
polymophism, would this 'practial' approach be applicable as well?


I don't think so.

HTH

Emmanuel


C



On Fri, Apr 8, 2011 at 8:03 AM, Joe Felsenstein > wrote:



Luke Harmon wrote:

Yes Joe is correct, there is more to this problem than meets the
eye. My implementation assumes equal probability of each unknown
state, which is quite different from modeling an actual
polymorphic character. I'm sure that doing something different
might matter in many cases.



Assuming equal probability of each possible state might be thought
of as a model of ambiguity of state, not polymorphism.  But even for
that it is not a complete likelihood treatment.  In likelihood
machinery, one uses conditional likelihoods, which give a likelihood
of 1 to each possible state.   This is not as crazy as it sounds
(see pages 255-256 of my book).   It is simply that what we have in
the conditional likelihoods is NOT the probability of the state, but
the probability of the ambiguous observation given the state.  So,
for example, if we see a purine but don't know whether it is A or G
(in a DNA sequence case), the probability of seeing purine, given
that we only can see purineness or pyrimidineness, and the state
really is A, is 1, and similarly if it is really G.   So the
conditional likelihoods for the four nucleotides are (1,0,1,0).
 Sounds wrong but it isn't.

Polymorphism is totally different: you have actually seen both states.

For discrete 0/1 characters, one can use Sewall Wright's (1934)
threshold model which I have discussed (briefly in the book and more
extensively in a 2005 paper in the Philosophical Transactions of the
Royal Society B).  I have a paper under revision at a major journal
about it and will release my program Threshml soon in a pre-PHYLIP
version.   Unlike Mark Pagel and Paul Lewis's Mk model, it predicts
polymorphism in  a natural way.   The population has an underlying
unobservable quantitative character, the "liability", that implies
some frequency of both 0 and 1 states.I think Ted Garland and
others also use a log-linear model that has somewhat similar
properties but is not exactly the same.

To get these models to deal with multiple character states is
possible but very very nontrivial.  If you see states 0, 1, 2, is 1
intermediate between 0 and 2, or is it off at right angles to both?
 There are possible threshold models that could do either -- telling
the difference between them requires lots of data.  With, say, 6
states it would be a nightmare.


Joe

Joe Felsenstein, [email protected] 
 Dept. of Genome Sciences, Univ. of Washington

 Box 355065, Seattle, WA 98195-5065 USA







--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-08 Thread Charles Willis
What the polymophism represents (ambiguity vs. an actual polymophism) is
key.

In my case, I am trying to code an actual polymorphism -- that is, a given
taxon exhibits multiple states of a given trait. However, my taxonomic level
is at the family, so these aren't polymorphisms in the population sense, but
rather different species with different traits.

>From a purely practical stand point, does it seem reasonable to re-draw the
taxon (family) as a polytomy with short branch-lengths and have each new tip
exhibit different character state? That is, I would be representing a given
family as a polytomy with as many taxa as polymorphic states.

In species with significant population structure correlated with a given
polymophism, would this 'practial' approach be applicable as well?

C



On Fri, Apr 8, 2011 at 8:03 AM, Joe Felsenstein wrote:

>
> Luke Harmon wrote:
>
>  Yes Joe is correct, there is more to this problem than meets the eye. My
>> implementation assumes equal probability of each unknown state, which is
>> quite different from modeling an actual polymorphic character. I'm sure that
>> doing something different might matter in many cases.
>>
>
>
> Assuming equal probability of each possible state might be thought of as a
> model of ambiguity of state, not polymorphism.  But even for that it is not
> a complete likelihood treatment.  In likelihood machinery, one uses
> conditional likelihoods, which give a likelihood of 1 to each possible
> state.   This is not as crazy as it sounds (see pages 255-256 of my book).
> It is simply that what we have in the conditional likelihoods is NOT the
> probability of the state, but the probability of the ambiguous observation
> given the state.  So, for example, if we see a purine but don't know whether
> it is A or G (in a DNA sequence case), the probability of seeing purine,
> given that we only can see purineness or pyrimidineness, and the state
> really is A, is 1, and similarly if it is really G.   So the conditional
> likelihoods for the four nucleotides are (1,0,1,0).  Sounds wrong but it
> isn't.
>
> Polymorphism is totally different: you have actually seen both states.
>
> For discrete 0/1 characters, one can use Sewall Wright's (1934) threshold
> model which I have discussed (briefly in the book and more extensively in a
> 2005 paper in the Philosophical Transactions of the Royal Society B).  I
> have a paper under revision at a major journal about it and will release my
> program Threshml soon in a pre-PHYLIP version.   Unlike Mark Pagel and Paul
> Lewis's Mk model, it predicts polymorphism in  a natural way.   The
> population has an underlying unobservable quantitative character, the
> "liability", that implies some frequency of both 0 and 1 states.I think
> Ted Garland and others also use a log-linear model that has somewhat similar
> properties but is not exactly the same.
>
> To get these models to deal with multiple character states is possible but
> very very nontrivial.  If you see states 0, 1, 2, is 1 intermediate between
> 0 and 2, or is it off at right angles to both?  There are possible threshold
> models that could do either -- telling the difference between them requires
> lots of data.  With, say, 6 states it would be a nightmare.
>
>
> Joe
> 
> Joe Felsenstein, [email protected]
>  Dept. of Genome Sciences, Univ. of Washington
>
>  Box 355065, Seattle, WA 98195-5065 USA
>
>
>
>
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-08 Thread Joe Felsenstein


Luke Harmon wrote:

Yes Joe is correct, there is more to this problem than meets the  
eye. My implementation assumes equal probability of each unknown  
state, which is quite different from modeling an actual polymorphic  
character. I'm sure that doing something different might matter in  
many cases.



Assuming equal probability of each possible state might be thought of  
as a model of ambiguity of state, not polymorphism.  But even for  
that it is not a complete likelihood treatment.  In likelihood  
machinery, one uses conditional likelihoods, which give a likelihood  
of 1 to each possible state.   This is not as crazy as it sounds (see  
pages 255-256 of my book).   It is simply that what we have in the  
conditional likelihoods is NOT the probability of the state, but the  
probability of the ambiguous observation given the state.  So, for  
example, if we see a purine but don't know whether it is A or G (in a  
DNA sequence case), the probability of seeing purine, given that we  
only can see purineness or pyrimidineness, and the state really is A,  
is 1, and similarly if it is really G.   So the conditional  
likelihoods for the four nucleotides are (1,0,1,0).  Sounds wrong but  
it isn't.


Polymorphism is totally different: you have actually seen both states.

For discrete 0/1 characters, one can use Sewall Wright's (1934)  
threshold model which I have discussed (briefly in the book and more  
extensively in a 2005 paper in the Philosophical Transactions of the  
Royal Society B).  I have a paper under revision at a major journal  
about it and will release my program Threshml soon in a pre-PHYLIP  
version.   Unlike Mark Pagel and Paul Lewis's Mk model, it predicts  
polymorphism in  a natural way.   The population has an underlying  
unobservable quantitative character, the "liability", that implies  
some frequency of both 0 and 1 states.I think Ted Garland and  
others also use a log-linear model that has somewhat similar  
properties but is not exactly the same.


To get these models to deal with multiple character states is  
possible but very very nontrivial.  If you see states 0, 1, 2, is 1  
intermediate between 0 and 2, or is it off at right angles to both?   
There are possible threshold models that could do either -- telling  
the difference between them requires lots of data.  With, say, 6  
states it would be a nightmare.


Joe

Joe Felsenstein, [email protected]
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-07 Thread Luke Harmon
Yes Joe is correct, there is more to this problem than meets the eye. My 
implementation assumes equal probability of each unknown state, which is quite 
different from modeling an actual polymorphic character. I'm sure that doing 
something different might matter in many cases.

lh

On Apr 7, 2011, at 8:14 AM, Joe Felsenstein wrote:

> 
> Luke Harmon said --
> 
>> Yes Emmanuel is correct, fitDiscrete does not deal with polymorphic data. I 
>> have a fix that I made for a specific project that I'm sending to Charles, 
>> if anyone else is interested email me off-list. It's very clunky.
> 
> I suspect this is not just a technical programming issue or a matter of 
> standardizing formats of files, but depends on what you want to assume about 
> the mode of evolution of a polymporphic character.  Not a trivial matter at 
> all, and not one where you just want to accept any old arbitrary rule.
> 
> For example there is a very old (1967) parsimony method called "polymorphism 
> parsimony" but it makes specific assumptions -- namely that polymorphism is 
> hard to retain along a lineage, easy to lose but hard to regain.
> 
> So do you want assume that, or what?
> 
> Joe
> 
> Joe Felsenstein [email protected]
> Department of Genome Sciences and Department of Biology,
> University of Washington, Box 355065, Seattle, WA 98195-5065 USA

Luke Harmon
Assistant Professor
Biological Sciences
University of Idaho
208-885-0346
[email protected]

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-07 Thread Joe Felsenstein

Luke Harmon said --

> Yes Emmanuel is correct, fitDiscrete does not deal with polymorphic data. I 
> have a fix that I made for a specific project that I'm sending to Charles, 
> if anyone else is interested email me off-list. It's very clunky.

I suspect this is not just a technical programming issue or a matter of 
standardizing formats of files, but depends on what you want to assume about 
the mode of evolution of a polymporphic character.  Not a trivial matter at 
all, and not one where you just want to accept any old arbitrary rule.

For example there is a very old (1967) parsimony method called "polymorphism 
parsimony" but it makes specific assumptions -- namely that polymorphism is 
hard to retain along a lineage, easy to lose but hard to regain.

So do you want assume that, or what?

Joe

Joe Felsenstein [email protected]
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-06 Thread Luke Harmon
Yes Emmanuel is correct, fitDiscrete does not deal with polymorphic data. I 
have a fix that I made for a specific project that I'm sending to Charles, if 
anyone else is interested email me off-list. It's very clunky.

lh
On Apr 6, 2011, at 4:28 PM, Emmanuel Paradis wrote:

> Hi,
> 
> Charles Willis wrote on 06/04/2011 01:24:
>> Hi,
>> I want to run some comparative methods in R (ace, fitDiscrete, etc) on a
>> categorical trait that has 6 states.
>> My problem is that certain taxa are polymorphic for this trait (they have
>> multiple states). Is there a way to code polymorphisms in R? I know you can
>> code polymorphisms in Mesquite (e.g., 1&2) and in Bayestraits (e.g., 12),
>> but I cannot seem to find a description on how to code similar data in R.
>> Importing polymorphic data from Mesquite (read.nexus) doesn't appear to be
>> an option.
>> I plan to run the analyzes coding the trait as 6 independent binary traits,
>> but I wanted to see if it was possible to run it as a polymorphic multistate
>> trait as well.
> 
> ace() takes multistate characters into account. Apparently not firDiscrete().
> 
> Best,
> 
> Emmanuel
> 
>> Thanks!
>> Charlie
>> Duke University
>> Department of Biology
>> 125 Science Drive
>> Durham NC 27708
>> CP (605) 553-1057
>> [email protected]
>> http://www.duke.edu/~cgw6/
>>  [[alternative HTML version deleted]]
>> ___
>> R-sig-phylo mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 
> -- 
> Emmanuel Paradis
> IRD, Jakarta, Indonesia
> http://ape.mpl.ird.fr/
> 
> ___
> R-sig-phylo mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Luke Harmon
Assistant Professor
Biological Sciences
University of Idaho
208-885-0346
[email protected]

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-06 Thread Charles Willis
But does ace() recognize polymorphisms or ambiguities within single taxon
for a multistate character? That is a single taxon with two states:

Taxon Trait
A 1
B 1,2
C 2

In the APE manual it says you cannot import such data from a nexus file
where ambiguities are indicted with parentheses such as (12).

In Phangorn you can code ambiguities by designating a new character and then
specifying that this new character actually represent 2 original characters
in a contrast matrix--similar ambiguous DNA characters (e.g., R = G,A).
Phangorn has a limited set of functions though.

I wasn't sure if there is a way to code such ambiguities in APE for use in
ace(), as well as other APE functions.

Thanks!
Charlie



On Wed, Apr 6, 2011 at 7:28 PM, Emmanuel Paradis wrote:

> Hi,
>
> Charles Willis wrote on 06/04/2011 01:24:
>
>  Hi,
>>
>> I want to run some comparative methods in R (ace, fitDiscrete, etc) on a
>> categorical trait that has 6 states.
>>
>> My problem is that certain taxa are polymorphic for this trait (they have
>> multiple states). Is there a way to code polymorphisms in R? I know you
>> can
>> code polymorphisms in Mesquite (e.g., 1&2) and in Bayestraits (e.g., 12),
>> but I cannot seem to find a description on how to code similar data in R.
>> Importing polymorphic data from Mesquite (read.nexus) doesn't appear to be
>> an option.
>>
>> I plan to run the analyzes coding the trait as 6 independent binary
>> traits,
>> but I wanted to see if it was possible to run it as a polymorphic
>> multistate
>> trait as well.
>>
>
> ace() takes multistate characters into account. Apparently not
> firDiscrete().
>
> Best,
>
> Emmanuel
>
>
>  Thanks!
>> Charlie
>>
>>
>> Duke University
>> Department of Biology
>> 125 Science Drive
>> Durham NC 27708
>> CP (605) 553-1057
>> [email protected]
>> http://www.duke.edu/~cgw6/
>>
>>[[alternative HTML version deleted]]
>>
>> ___
>> R-sig-phylo mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>
>>
> --
> Emmanuel Paradis
> IRD, Jakarta, Indonesia
> http://ape.mpl.ird.fr/
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Multistate Trait Polymorphism

2011-04-06 Thread Emmanuel Paradis

Hi,

Charles Willis wrote on 06/04/2011 01:24:

Hi,

I want to run some comparative methods in R (ace, fitDiscrete, etc) on a
categorical trait that has 6 states.

My problem is that certain taxa are polymorphic for this trait (they have
multiple states). Is there a way to code polymorphisms in R? I know you can
code polymorphisms in Mesquite (e.g., 1&2) and in Bayestraits (e.g., 12),
but I cannot seem to find a description on how to code similar data in R.
Importing polymorphic data from Mesquite (read.nexus) doesn't appear to be
an option.

I plan to run the analyzes coding the trait as 6 independent binary traits,
but I wanted to see if it was possible to run it as a polymorphic multistate
trait as well.


ace() takes multistate characters into account. Apparently not 
firDiscrete().


Best,

Emmanuel


Thanks!
Charlie


Duke University
Department of Biology
125 Science Drive
Durham NC 27708
CP (605) 553-1057
[email protected]
http://www.duke.edu/~cgw6/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

___
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo