Re: [R-sig-phylo] A question about phylogenetic signal significance with 1000 trees

2016-12-09 Thread Francois KECK

Hi,

You could also be interested in the function phyloSignalBS available in 
the package phylosignal.


Best,

François


Le 08/12/2016 à 17:26, Carturan, Bruno a écrit :

Hello R-sig-phylo community,

I am trying to measure the phylogenetic signal of functional traits with the 
Blomberg’s K and Pagel’s λ methods. My problem is that I don’t have one 
phylogenetic tree but 1000 (the trees were created by first combining different 
molecular, morphologic and taxonomic trees, then by conducting a MCMC analysis 
and finally by selecting the 1000 trees – work made by Huang and Roy 2015).

Is there a way to proceed to test the significance of the K and λ values for 
each trait with these 1000 trees?

An idea I have is to run the test and obtain a p value for each of the 1000 
trees, which would give me a distribution of p values, and then consider that 
if 80% (for instance) of the p values are significant, the K or λ value is 
significant. This is computationally very demanding (I have 735 species) and I 
don’t know if this could be considered as a valid procedure.

I am looking forward for you answer.

Bruno Carturan
PhD Candidate
Complex Environmental Systems Lab
University of British Columbia
Okanagan Campus



From: Liam J. Revell [liam.rev...@umb.edu]
Sent: Wednesday, December 07, 2016 6:54 AM
To: Carturan, Bruno
Subject: Re: A question about phylogenetic signal significance with 1000 trees

Bruno.

This is an interesting question. Since this doesn't pertain specifically
to phytools, perhaps you should pose it to the R-sig-phylo email
list-serve. You may get a good answer there.

All the best,

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 12/6/2016 6:16 PM, Carturan, Bruno wrote:

Hello Dr. Revell,



I am a PhD student from the Complex Environmental Systems Lab, from the
University of British Columbia and I would like to ask you question.



I am trying to measure the phylogenetic signal of functional traits with
the Blomberg’s K and Pagel’s λ methods. My problem is that I don’t have
one phylogenetic tree but 1000 (the trees were created by first
combining different molecular, morphologic and taxonomic trees, then by
conducting a MCMC analysis and finally by selecting the 1000 trees –
work made by Huang and Roy 2015).



Is there a way to proceed to test the significance of the K and λ values
for each trait with these 1000 trees?



An idea I have is to run the test and obtain a p value for each of the
1000 trees, which would give me a distribution of p values, and then
consider that if 80% (for instance) of the p values are significant, the
K or λ value is significant. This is computationally very demanding (I
have 735 species) and I don’t know if this could be considered as a
valid procedure.



I am looking forward for you answer.



Best regards,


Bruno Carturan
PhD Candidate
Complex Environmental Systems Lab
University of British Columbia
Okanagan Campus


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] A question about phylogenetic signal significance with 1000 trees

2016-12-08 Thread Theodore Garland
And don't forget that measurement error in the tip data will also cause
underestimation of phylogenetic signal, unless you use methods specifically
designed to allow for that!

Ives, A. R., P. E. Midford, and T. Garland. 2007. Within-species variation
and measurement error in phylogenetic comparative methods. Systematic
Biology 56:252–270.

Sometimes I am amazed that we ever find statistically significant signal!

Cheers,
Ted

On Thu, Dec 8, 2016 at 12:27 PM, David Bapst  wrote:

> A deep thought on this topic, from a mind run aground amidst grading
> papers.
>
> While I am a big fan of using multiple trees, we should keep in mind
> that phylogenetic uncertainty makes signal into a rather snaky
> concept. By this, I mean that there could have only been one true
> history (which may or may not be captured within our sample of
> potentially true-ish phylogenetic reconstructions). Thus, even if
> phylogenetic signal was perfect in a given trait, we would
> underestimate the extent of that signal on all trees except for the
> true tree. It should be very hard for a trait with less than perfect
> signal to appear to have higher amounts of signal than reality, as
> that would require a phylogeny that perfectly matches the distortions
> from the expectation under true signal. Thus, whatever distribution of
> signal estimates we obtain from a distribution of trees, we should
> probably expect the upper tail to be closer to reality than the mean
> expectation.
>
> (Something similar should also be true of rates, where inaccurate
> trees are probably more likely to upwardly bias rate estimates, rather
> than downward. Slow rates should be much harder to accurately measure.
> In both cases, based on my experience looking at how parameter
> estimates vary in simulations with different dating methods, I think
> the effects will be partly exacerbated by variation in the dating
> (i.e. branch lengths) and affected less by differences in topology
> that don't greatly impact the divergence times.)
>
> ...Or maybe I'm just crazy.
>
> Cheers,
> -Dave Bapst
>
>
>
> On Thu, Dec 8, 2016 at 9:53 AM, Brian O'Meara 
> wrote:
> > Well, since 1000 are a sample, 100 sampled from those is a somewhat more
> > manageable sample.
> >
> > As to using, say, an 80% cutoff is a valid procedure, one question is
> what
> > you're doing this for. I'd be tempted to just show or report the
> > distribution of K, lambda, and p values and talk about what they mean in
> > terms of your biological question: "We don't know the tree exactly, but
> > it's clear that for most feasible trees, related things are similar" or
> > something like that. If it's a question of scaling trees for later
> > analyses, you can just (using a script, or built in functions) choose the
> > particular scaling to be appropriate for each tree, and so the
> distribution
> > across them doesn't matter in that case.
> >
> > Best,
> > Brian
> >
> > ___
> > Brian O'Meara, http://www.brianomeara.info, especially Calendar
> > , CV
> > , and Feedback
> > 
> >
> > Associate Professor, Dept. of Ecology & Evolutionary Biology, UT
> Knoxville
> > Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> > Associate Director for Postdoctoral Activities, National Institute for
> > Mathematical & Biological Synthesis  (NIMBioS)
> > Communication Director, Society of Systematic Biologists
> >
> > On Thu, Dec 8, 2016 at 11:26 AM, Carturan, Bruno 
> > wrote:
> >
> >> Hello R-sig-phylo community,
> >>
> >> I am trying to measure the phylogenetic signal of functional traits with
> >> the Blomberg’s K and Pagel’s λ methods. My problem is that I don’t have
> one
> >> phylogenetic tree but 1000 (the trees were created by first combining
> >> different molecular, morphologic and taxonomic trees, then by
> conducting a
> >> MCMC analysis and finally by selecting the 1000 trees – work made by
> Huang
> >> and Roy 2015).
> >>
> >> Is there a way to proceed to test the significance of the K and λ values
> >> for each trait with these 1000 trees?
> >>
> >> An idea I have is to run the test and obtain a p value for each of the
> >> 1000 trees, which would give me a distribution of p values, and then
> >> consider that if 80% (for instance) of the p values are significant,
> the K
> >> or λ value is significant. This is computationally very demanding (I
> have
> >> 735 species) and I don’t know if this could be considered as a valid
> >> procedure.
> >>
> >> I am looking forward for you answer.
> >>
> >> Bruno Carturan
> >> PhD Candidate
> >> Complex Environmental Systems Lab
> >> University of British Columbia
> >> Okanagan Campus
> >>
> >>
> >> 
> >> From: Liam J. Revell [liam.rev...@umb.edu]
> >> Sent: Wednesday, December 07, 2016 6:54 AM
> >> T

Re: [R-sig-phylo] A question about phylogenetic signal significance with 1000 trees

2016-12-08 Thread David Bapst
A deep thought on this topic, from a mind run aground amidst grading papers.

While I am a big fan of using multiple trees, we should keep in mind
that phylogenetic uncertainty makes signal into a rather snaky
concept. By this, I mean that there could have only been one true
history (which may or may not be captured within our sample of
potentially true-ish phylogenetic reconstructions). Thus, even if
phylogenetic signal was perfect in a given trait, we would
underestimate the extent of that signal on all trees except for the
true tree. It should be very hard for a trait with less than perfect
signal to appear to have higher amounts of signal than reality, as
that would require a phylogeny that perfectly matches the distortions
from the expectation under true signal. Thus, whatever distribution of
signal estimates we obtain from a distribution of trees, we should
probably expect the upper tail to be closer to reality than the mean
expectation.

(Something similar should also be true of rates, where inaccurate
trees are probably more likely to upwardly bias rate estimates, rather
than downward. Slow rates should be much harder to accurately measure.
In both cases, based on my experience looking at how parameter
estimates vary in simulations with different dating methods, I think
the effects will be partly exacerbated by variation in the dating
(i.e. branch lengths) and affected less by differences in topology
that don't greatly impact the divergence times.)

...Or maybe I'm just crazy.

Cheers,
-Dave Bapst



On Thu, Dec 8, 2016 at 9:53 AM, Brian O'Meara  wrote:
> Well, since 1000 are a sample, 100 sampled from those is a somewhat more
> manageable sample.
>
> As to using, say, an 80% cutoff is a valid procedure, one question is what
> you're doing this for. I'd be tempted to just show or report the
> distribution of K, lambda, and p values and talk about what they mean in
> terms of your biological question: "We don't know the tree exactly, but
> it's clear that for most feasible trees, related things are similar" or
> something like that. If it's a question of scaling trees for later
> analyses, you can just (using a script, or built in functions) choose the
> particular scaling to be appropriate for each tree, and so the distribution
> across them doesn't matter in that case.
>
> Best,
> Brian
>
> ___
> Brian O'Meara, http://www.brianomeara.info, especially Calendar
> , CV
> , and Feedback
> 
>
> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> Associate Director for Postdoctoral Activities, National Institute for
> Mathematical & Biological Synthesis  (NIMBioS)
> Communication Director, Society of Systematic Biologists
>
> On Thu, Dec 8, 2016 at 11:26 AM, Carturan, Bruno 
> wrote:
>
>> Hello R-sig-phylo community,
>>
>> I am trying to measure the phylogenetic signal of functional traits with
>> the Blomberg’s K and Pagel’s λ methods. My problem is that I don’t have one
>> phylogenetic tree but 1000 (the trees were created by first combining
>> different molecular, morphologic and taxonomic trees, then by conducting a
>> MCMC analysis and finally by selecting the 1000 trees – work made by Huang
>> and Roy 2015).
>>
>> Is there a way to proceed to test the significance of the K and λ values
>> for each trait with these 1000 trees?
>>
>> An idea I have is to run the test and obtain a p value for each of the
>> 1000 trees, which would give me a distribution of p values, and then
>> consider that if 80% (for instance) of the p values are significant, the K
>> or λ value is significant. This is computationally very demanding (I have
>> 735 species) and I don’t know if this could be considered as a valid
>> procedure.
>>
>> I am looking forward for you answer.
>>
>> Bruno Carturan
>> PhD Candidate
>> Complex Environmental Systems Lab
>> University of British Columbia
>> Okanagan Campus
>>
>>
>> 
>> From: Liam J. Revell [liam.rev...@umb.edu]
>> Sent: Wednesday, December 07, 2016 6:54 AM
>> To: Carturan, Bruno
>> Subject: Re: A question about phylogenetic signal significance with 1000
>> trees
>>
>> Bruno.
>>
>> This is an interesting question. Since this doesn't pertain specifically
>> to phytools, perhaps you should pose it to the R-sig-phylo email
>> list-serve. You may get a good answer there.
>>
>> All the best,
>>
>> Liam J. Revell, Associate Professor of Biology
>> University of Massachusetts Boston
>> web: http://faculty.umb.edu/liam.revell/
>> email: liam.rev...@umb.edu
>> blog: http://blog.phytools.org
>>
>> On 12/6/2016 6:16 PM, Carturan, Bruno wrote:
>> > Hello Dr. Revell,
>> >
>> >
>> >
>> > I am a PhD student from the Complex Environmental Systems Lab, from

Re: [R-sig-phylo] A question about phylogenetic signal significance with 1000 trees

2016-12-08 Thread Carlos H. Biagolini Junior
Hi Bruno, I had a similar problem. In my case, I had a set of  1000 trees
(from birdtree.org), and I dealt with it, by:

# Load packages
library(adephylo)
library(phytools)
library(geiger)

# Load data
data.base<- read.table(file = "data.base.txt", header = TRUE, row.names =
1);

# Load trees
n.trees <- 1000
trees<-read.nexus("tree.tre");

object_x  <- data.base$x
names(object_x)<-rownames(data.base)

# Check for mismatch data and trees
name.check(trees[[(1)]], object_x )

  1) Pagel's lambda
lam.x = NULL;

for (i in 1:n.trees ) {
  lam.x [i] <- phylosig(trees[[(i)]], object_x, method="lambda", test=TRUE,
nsim=999)$lambda
  show(paste(round((i/n.trees)*100, digits = 1),"%"))
}
mean(lam.x)
min(lam.x)
max(lam.x)
sd(lam.x)


  2)  Abouheif's Cmean
abouheif.x =  NULL
for (i in 1:n.trees ) {
  abouheif.x[i]<-abouheif.moran(phylo4d(trees[[(i)]], object_x))$ pvalue
  show(paste(round((i/n.trees)*100, digits = 1),"%"))
}

mean(abouheif.x)
max(abouheif.x)
min(abouheif.x)
sd(abouheif.x)


  3)Blomberg's K
blo.x = NULL

for (i in 1:n.trees ) {
  blo.x [i] <- phylosig(trees[[(i)]], object_x, method="K", test=TRUE,
nsim=999)$K
  show(paste(round((i/n.trees)*100, digits = 1),"%"))
}

mean(blo.x)
max(blo.x)
min(blo.x)
sd(blo.x)



  4)  Moran's I
mor.x =  NULL

for (i in 1:n.trees ) {
  mor.x[i] <- abouheif.moran(phylo4d(trees[[(i)]],
object_x),method="Abouheif")$ pvalue
  show(paste(round((i/n.trees)*100, digits = 1),"%"))
}

mean(mor.x)
min(mor.x)
max(mor.x)
sd(mor.x)



All the best,
- Carlos Biagolini-Jr.





On Thu, Dec 8, 2016 at 2:26 PM, Carturan, Bruno 
wrote:

> Bruno





-- 
Carlos Biagolini-Jr.

Doutorando em Ecologia

Laboratório de Comportamento Animal

Universidade de Brasília

CV-lattes  Research Gate


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] A question about phylogenetic signal significance with 1000 trees

2016-12-08 Thread Brian O'Meara
Well, since 1000 are a sample, 100 sampled from those is a somewhat more
manageable sample.

As to using, say, an 80% cutoff is a valid procedure, one question is what
you're doing this for. I'd be tempted to just show or report the
distribution of K, lambda, and p values and talk about what they mean in
terms of your biological question: "We don't know the tree exactly, but
it's clear that for most feasible trees, related things are similar" or
something like that. If it's a question of scaling trees for later
analyses, you can just (using a script, or built in functions) choose the
particular scaling to be appropriate for each tree, and so the distribution
across them doesn't matter in that case.

Best,
Brian

___
Brian O'Meara, http://www.brianomeara.info, especially Calendar
, CV
, and Feedback


Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
Associate Director for Postdoctoral Activities, National Institute for
Mathematical & Biological Synthesis  (NIMBioS)
Communication Director, Society of Systematic Biologists

On Thu, Dec 8, 2016 at 11:26 AM, Carturan, Bruno 
wrote:

> Hello R-sig-phylo community,
>
> I am trying to measure the phylogenetic signal of functional traits with
> the Blomberg’s K and Pagel’s λ methods. My problem is that I don’t have one
> phylogenetic tree but 1000 (the trees were created by first combining
> different molecular, morphologic and taxonomic trees, then by conducting a
> MCMC analysis and finally by selecting the 1000 trees – work made by Huang
> and Roy 2015).
>
> Is there a way to proceed to test the significance of the K and λ values
> for each trait with these 1000 trees?
>
> An idea I have is to run the test and obtain a p value for each of the
> 1000 trees, which would give me a distribution of p values, and then
> consider that if 80% (for instance) of the p values are significant, the K
> or λ value is significant. This is computationally very demanding (I have
> 735 species) and I don’t know if this could be considered as a valid
> procedure.
>
> I am looking forward for you answer.
>
> Bruno Carturan
> PhD Candidate
> Complex Environmental Systems Lab
> University of British Columbia
> Okanagan Campus
>
>
> 
> From: Liam J. Revell [liam.rev...@umb.edu]
> Sent: Wednesday, December 07, 2016 6:54 AM
> To: Carturan, Bruno
> Subject: Re: A question about phylogenetic signal significance with 1000
> trees
>
> Bruno.
>
> This is an interesting question. Since this doesn't pertain specifically
> to phytools, perhaps you should pose it to the R-sig-phylo email
> list-serve. You may get a good answer there.
>
> All the best,
>
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
>
> On 12/6/2016 6:16 PM, Carturan, Bruno wrote:
> > Hello Dr. Revell,
> >
> >
> >
> > I am a PhD student from the Complex Environmental Systems Lab, from the
> > University of British Columbia and I would like to ask you question.
> >
> >
> >
> > I am trying to measure the phylogenetic signal of functional traits with
> > the Blomberg’s K and Pagel’s λ methods. My problem is that I don’t have
> > one phylogenetic tree but 1000 (the trees were created by first
> > combining different molecular, morphologic and taxonomic trees, then by
> > conducting a MCMC analysis and finally by selecting the 1000 trees –
> > work made by Huang and Roy 2015).
> >
> >
> >
> > Is there a way to proceed to test the significance of the K and λ values
> > for each trait with these 1000 trees?
> >
> >
> >
> > An idea I have is to run the test and obtain a p value for each of the
> > 1000 trees, which would give me a distribution of p values, and then
> > consider that if 80% (for instance) of the p values are significant, the
> > K or λ value is significant. This is computationally very demanding (I
> > have 735 species) and I don’t know if this could be considered as a
> > valid procedure.
> >
> >
> >
> > I am looking forward for you answer.
> >
> >
> >
> > Best regards,
> >
> >
> > Bruno Carturan
> > PhD Candidate
> > Complex Environmental Systems Lab
> > University of British Columbia
> > Okanagan Campus
> > 
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-
> sig-ph...@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/