Re: [R-sig-phylo] question about measurement error in phylogenetic signal (Krzysztof Bartoszek)

2013-07-08 Thread Joe Felsenstein

In addition to the references to papers by Hansen and Bartoszek, and by Ives, 
Midford and Garland, I would biasedly suggest this paper:

Felsenstein, J. 2008. Comparative methods with sampling error and 
within-species variation: contrasts revisited and revised. American Naturalist 
171: 713-725.

The method estimates the within-species phenotypic variation (which, when you 
are analysing species means is the relevamt measurement error and also 
includes actual measurement error) and corrects for it.

The software announced there is not in R, but I believe that Liam Revell's  
phytools  package can call our program.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about measurement error in phylogenetic signal

2013-07-08 Thread Hunt, Gene
Small follow-up to Liam's suggestion:  If you do use an arcsin transformation 
for proportional data, the variance of arcsin(sqrt(p)) is approximately 1/(4N), 
where p is the proportion and N is sample size.  The approximation is good 
unless the proportion is very close to 0 or 1.

Best,
Gene


--
Gene Hunt
Curator, Department of Paleobiology
National Museum of Natural History
Smithsonian Institution [NHB, MRC 121]
P.O. Box 37012
Washington DC 20013-7012
Phone: 202-633-1331  Fax: 202-786-2832
http://paleobiology.si.edu/staff/individuals/hunt.cfm

From: Liam J. Revell liam.rev...@umb.edumailto:liam.rev...@umb.edu
Date: Sunday, July 7, 2013 3:10 PM
To: Xavier Prudent prudentxav...@gmail.commailto:prudentxav...@gmail.com
Cc: mailman, r-sig-phylo 
r-sig-phylo@r-project.orgmailto:r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] question about measurement error in phylogenetic 
signal

Hi Eliot  Xavier.

I think that Xavier's suggestion is not a particularly good idea in this
case because random error will tend to depress phylogenetic signal. In
other words - random data error does not introduce random error in
phylogenetic signal, rather it biases phylogenetic signal towards 0.

A better approach is to incorporate error in the estimation of species
means directly - following Ives et al. (2007). This is implemented in
phylosig of the phytools package.

Your formula for the standard error of a proportion is indeed the
formula for the correct standard error given your data; however, it
raises the question of whether the assumed model (BM) is suitable for
your data (or perhaps this is what you are trying to find out). For
small samples (n30), some people have recommended an n+4 correction -
in which 2 successes and 2 failures are added during calculation of the
SE. If you are using an arcsine transformation, as is common for
proportion data, you need to be aware that your standard errors are on
the original scale! (I don't know the formula for standard errors on the
transformed scale.)

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edumailto:liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/4/2013 3:36 AM, Xavier Prudent wrote:
Dear Eliot,

One way to cope with the uncertainty on the inputs in an analysis is vary
these inputs by some amount (like +- 1 standard deviation) and rerun your
analysis. The spread of the result tells you then how robust your analysis
is.
Pay attention that the inputs may be varied in an independent way if they
ARE independent, if they highly correlated you may prefer to vary them
simultaneously.

Hope that helps,

Regards,
Xavier


2013/7/4 Eliot Miller eliotmil...@umsl.edumailto:eliotmil...@umsl.edu

Hello all,

I have been trying to get something to work in a number of different
packages and with a number of different approaches today that I couldn't
get to run in a believable way. Before I spend another day on this, I was
wondering what people think about the idea in general.

I have a dataset of disease prevalence across ~100 species. There are ~2000
individuals total across the dataset, with 4 individuals per species.
Prevalence per individual is coded as 0 or 1. I am interested in the
phylogenetic signal of disease prevalence across the species. One approach
that works is to simply calculate prevalence as the species-specific mean,
i.e. if 3 individuals of 6 for a species had the disease, the prevalence
would be 3/6 = 0.5. Then one can use these values with e.g. phylosig() (I
arcsin sqrt transformed these proportions here). Like the few other
published tests of phylogenetic signal in disease prevalence, there is
little signal here. I could leave it at that, because in general there are
very low detections in this dataset and it's probably not ideally suited to
address this question anyhow.

That aside however, because not all individuals of a given species always
have the disease, I wanted to incorporate measurement error. So, based on
the calculation for SE for binary data from the site:

http://www.researchgate.net/post/Can_standard_deviation_and_standard_error_be_calculated_for_a_binary_variable
,
I also calculated a species-specific SEs as the
sqrt(mean(prevalence)*((1-
mean(prevalence))/individuals)).

What do people think about this? It's hardly measurement error in the sense
we normally mean it. On the other hand, I think it would be neat if there
were some way to account for variation among individuals in prevalence, and
the influence this has on phylogenetic signal.

Cheers,
Eliot

  [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.orgmailto:R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/






___
R-sig-phylo

Re: [R-sig-phylo] question about measurement error in phylogenetic signal

2013-07-07 Thread Liam J. Revell

Hi Eliot  Xavier.

I think that Xavier's suggestion is not a particularly good idea in this 
case because random error will tend to depress phylogenetic signal. In 
other words - random data error does not introduce random error in 
phylogenetic signal, rather it biases phylogenetic signal towards 0.


A better approach is to incorporate error in the estimation of species 
means directly - following Ives et al. (2007). This is implemented in 
phylosig of the phytools package.


Your formula for the standard error of a proportion is indeed the 
formula for the correct standard error given your data; however, it 
raises the question of whether the assumed model (BM) is suitable for 
your data (or perhaps this is what you are trying to find out). For 
small samples (n30), some people have recommended an n+4 correction - 
in which 2 successes and 2 failures are added during calculation of the 
SE. If you are using an arcsine transformation, as is common for 
proportion data, you need to be aware that your standard errors are on 
the original scale! (I don't know the formula for standard errors on the 
transformed scale.)


- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/4/2013 3:36 AM, Xavier Prudent wrote:

Dear Eliot,

One way to cope with the uncertainty on the inputs in an analysis is vary
these inputs by some amount (like +- 1 standard deviation) and rerun your
analysis. The spread of the result tells you then how robust your analysis
is.
Pay attention that the inputs may be varied in an independent way if they
ARE independent, if they highly correlated you may prefer to vary them
simultaneously.

Hope that helps,

Regards,
Xavier


2013/7/4 Eliot Miller eliotmil...@umsl.edu


Hello all,

I have been trying to get something to work in a number of different
packages and with a number of different approaches today that I couldn't
get to run in a believable way. Before I spend another day on this, I was
wondering what people think about the idea in general.

I have a dataset of disease prevalence across ~100 species. There are ~2000
individuals total across the dataset, with 4 individuals per species.
Prevalence per individual is coded as 0 or 1. I am interested in the
phylogenetic signal of disease prevalence across the species. One approach
that works is to simply calculate prevalence as the species-specific mean,
i.e. if 3 individuals of 6 for a species had the disease, the prevalence
would be 3/6 = 0.5. Then one can use these values with e.g. phylosig() (I
arcsin sqrt transformed these proportions here). Like the few other
published tests of phylogenetic signal in disease prevalence, there is
little signal here. I could leave it at that, because in general there are
very low detections in this dataset and it's probably not ideally suited to
address this question anyhow.

That aside however, because not all individuals of a given species always
have the disease, I wanted to incorporate measurement error. So, based on
the calculation for SE for binary data from the site:

http://www.researchgate.net/post/Can_standard_deviation_and_standard_error_be_calculated_for_a_binary_variable
,
I also calculated a species-specific SEs as the
sqrt(mean(prevalence)*((1-
mean(prevalence))/individuals)).

What do people think about this? It's hardly measurement error in the sense
we normally mean it. On the other hand, I think it would be neat if there
were some way to account for variation among individuals in prevalence, and
the influence this has on phylogenetic signal.

Cheers,
Eliot

 [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] question about measurement error in phylogenetic signal

2013-07-04 Thread Xavier Prudent
Dear Eliot,

One way to cope with the uncertainty on the inputs in an analysis is vary
these inputs by some amount (like +- 1 standard deviation) and rerun your
analysis. The spread of the result tells you then how robust your analysis
is.
Pay attention that the inputs may be varied in an independent way if they
ARE independent, if they highly correlated you may prefer to vary them
simultaneously.

Hope that helps,

Regards,
Xavier


2013/7/4 Eliot Miller eliotmil...@umsl.edu

 Hello all,

 I have been trying to get something to work in a number of different
 packages and with a number of different approaches today that I couldn't
 get to run in a believable way. Before I spend another day on this, I was
 wondering what people think about the idea in general.

 I have a dataset of disease prevalence across ~100 species. There are ~2000
 individuals total across the dataset, with 4 individuals per species.
 Prevalence per individual is coded as 0 or 1. I am interested in the
 phylogenetic signal of disease prevalence across the species. One approach
 that works is to simply calculate prevalence as the species-specific mean,
 i.e. if 3 individuals of 6 for a species had the disease, the prevalence
 would be 3/6 = 0.5. Then one can use these values with e.g. phylosig() (I
 arcsin sqrt transformed these proportions here). Like the few other
 published tests of phylogenetic signal in disease prevalence, there is
 little signal here. I could leave it at that, because in general there are
 very low detections in this dataset and it's probably not ideally suited to
 address this question anyhow.

 That aside however, because not all individuals of a given species always
 have the disease, I wanted to incorporate measurement error. So, based on
 the calculation for SE for binary data from the site:

 http://www.researchgate.net/post/Can_standard_deviation_and_standard_error_be_calculated_for_a_binary_variable
 ,
 I also calculated a species-specific SEs as the
 sqrt(mean(prevalence)*((1-
 mean(prevalence))/individuals)).

 What do people think about this? It's hardly measurement error in the sense
 we normally mean it. On the other hand, I think it would be neat if there
 were some way to account for variation among individuals in prevalence, and
 the influence this has on phylogenetic signal.

 Cheers,
 Eliot

 [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/




-- 
*---
Xavier Prudent
*
*
Computational biology and evolutionary genomics
*
*
*
*Guest scientist at the Max-Planck-Institut für Physik komplexer Systeme*
*(MPI-PKS)*
*Noethnitzer Str. 38*
*01187 Dresden
*
*
*
*Max Planck-Institute for Molecular Cell Biology and Genetics*
*
(MPI-CBG)
*
*
Pfotenhauerstraße 108
*
*
01307 Dresden
*
*

*
*
Phone: +49 351 210-2621
*
*Mail: prudent [ at ] mpi-cbg.de
**---*

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] question about measurement error in phylogenetic signal

2013-07-03 Thread Eliot Miller
Hello all,

I have been trying to get something to work in a number of different
packages and with a number of different approaches today that I couldn't
get to run in a believable way. Before I spend another day on this, I was
wondering what people think about the idea in general.

I have a dataset of disease prevalence across ~100 species. There are ~2000
individuals total across the dataset, with 4 individuals per species.
Prevalence per individual is coded as 0 or 1. I am interested in the
phylogenetic signal of disease prevalence across the species. One approach
that works is to simply calculate prevalence as the species-specific mean,
i.e. if 3 individuals of 6 for a species had the disease, the prevalence
would be 3/6 = 0.5. Then one can use these values with e.g. phylosig() (I
arcsin sqrt transformed these proportions here). Like the few other
published tests of phylogenetic signal in disease prevalence, there is
little signal here. I could leave it at that, because in general there are
very low detections in this dataset and it's probably not ideally suited to
address this question anyhow.

That aside however, because not all individuals of a given species always
have the disease, I wanted to incorporate measurement error. So, based on
the calculation for SE for binary data from the site:
http://www.researchgate.net/post/Can_standard_deviation_and_standard_error_be_calculated_for_a_binary_variable,
I also calculated a species-specific SEs as the
sqrt(mean(prevalence)*((1-
mean(prevalence))/individuals)).

What do people think about this? It's hardly measurement error in the sense
we normally mean it. On the other hand, I think it would be neat if there
were some way to account for variation among individuals in prevalence, and
the influence this has on phylogenetic signal.

Cheers,
Eliot

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/