Re: [R-sig-phylo] Why does ace set a random number generator seed?

2017-10-20 Thread Emmanuel Paradis

Hi Dave,

The seed is created every time you generate random data (see details in 
?.Random.seed):


R> exists(".Random.seed")
[1] FALSE
R> sample(1)
[1] 1
R> exists(".Random.seed")
[1] TRUE

So in your code below the seed is created by calling rtree(). Indeed, 
with ape 4.1:


R> rm(".Random.seed")
R> data(bird.orders)
R> x <- 1:23
R> exists(".Random.seed")
[1] FALSE
R> o <- ace(x, bird.orders)
R> exists(".Random.seed")
[1] FALSE

However, with the new "soon-to-be-on-CRAN" version, a seed is created by 
the same code. The "culprit" is actually Rcpp: the new ape has C++ code 
linked to Rcpp. For a reason I ignore, it seems that a random seed is 
initialized when calling a function linked to Rcpp. Here is an example:


R> rm(".Random.seed")
R> library(RcppEigen)
R> exists(".Random.seed")
[1] FALSE
R> o <- fastLm(1, 1)
R> exists(".Random.seed")
[1] TRUE

The new C++ code in ape is called by reorder(phy, "postorder") which is 
used by many functions in ape (ace, pic, plot.phylo, vcv, ...).


Best,

Emmanuel

Le 19/10/2017 à 21:00, David Bapst a écrit :

Emmanuel, all-

I noticed today that a workspace I was working with had a random
number seed set in it, but didn't remember setting one. Finally, I
discovered the culprit was ace. Here's a reproducible example,
demonstrating that a seed exists after running ace:

library(ape)
tree<-rtree(10)
ace(1:10,tree)
.Random.seed

I am using ape 4.1, and it doesn't seem to be addressed in the
forthcoming version, given my reading of the changes log (but I might
have missed it). What's going? Why is a random number seed being set?

Cheers,
-Dave



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] Brownian Evolution

2017-10-20 Thread William Gelnaw
Hi all,
  I've been investigating the use of Pagel's lambda and run into a problem
comparing trees that have a low speciation rate and no extinction with
trees that have a high speciation rate and lots of extinction.   I'm
comparing phylogenetic signal in two characters.  In the first, I simulate
a character under Brownian evolution over a 200 taxon tree and then prune
it down to 100 taxa.  The second character was evolved over the pruned-down
tree.  I then estimated Pagel's lambda for both characters on the
pruned-down tree.  Doing this 500 times, I found that the error rate in the
estimate of lambda was not the same for the two methods of generating the
data.  The difference was small but statistically significant.  To my
understanding though, there shouldn't be a difference.  The rate of
evolution is the same in both cases and the tree topology and branch
lengths are identical.  If the characters are evolving in a Brownian way,
shouldn't there be, on average, no difference between the characters with
respect to phylogenetic signal?  Could this be an artifact of using fastBM
to simulate the characters?  Is there another function that I should be
using to simulate a character evolving under Brownian motion?

Here's the script I'm using:
library(phytools)
library(TreeSim)
library(adephylo)
age<-2
lambda <- 2.0
mu <- 0.5
frac <-0.6
sample_size<-100
smalltrees<-list()
Bigtrees<-sim.bd.taxa.age(200,500,lambda,mu,frac,age)
## using sim.bd.taxa.age to get an ultrametric tree with known age
subtrees<-list()
sig<-vector()
sig_sub<-vector()
for (i in 1:500){
var<-fastBM(Bigtrees[[i]],nsim=1)
tips_to_drop<-sample(1:200, 200-sample_size, replace=FALSE)
subtrees[[i]]<-drop.tip(Bigtrees[[i]], tips_to_drop)
var1<-fastBM(subtrees[[i]],nsim=1)
sig_sub[i]<-phylosig(subtrees[[i]],var,method="lambda")$lambda
sig[i]<-phylosig(subtrees[[i]],var1,method="lambda")$lambda}
lamb<-(2-sig)
## reflected the data around lambda=1 so that standard deviation has a
symmetric distribution
lamb<-c(lamb,sig)
lamb_sub<-(2-sig_sub)
lamb_sub<-c(lamb_sub,sig_sub)
var.test(lamb,lamb_sub)

Best regards,
  - Will Gelnaw

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Brownian Evolution

2017-10-20 Thread Liam J. Revell

Hi William.

Why would you suspect phytools? ;)

I didn't figure this out for you, but I did eliminate some possibilities.

It does not seem to be due to phytools::fastBM. You can replace fastBM 
with phytools:sim.corrs or geiger:sim.char as follows:


var<-sim.char(Bigtrees[[i]],par=1,root=0)[,,1]
## or
var<-sim.corrs(Bigtrees[[i]],matrix(1,1,1))
## and
var1<-sim.char(subtrees[[i]],par=1,root=0)[,,1]
## or
var1<-sim.corrs(subtrees[[i]],matrix(1,1,1))

and the result is the same. (Note that all three use different methods 
for simulation.)


Neither does it seem to be due to phytools::phylosig. You can replace 
phylosig with geiger::fitContinuous as follows:


maxlambda<-phytools:::maxLambda(subtrees[[i]])
sig_sub[i]<-fitContinuous(subtrees[[i]],var[subtrees[[i]]$tip.label],
model="lambda",bounds=list(lambda=c(0,maxlambda)))$opt$lambda
sig[i]<-fitContinuous(subtrees[[i]],var1,model="lambda",
bounds=list(lambda=c(0,maxlambda)))$opt$lambda

and the result is unchanged. (Note that we need to compute the maximum 
value of lambda for which the likelihood function is defined otherwise 
fitContinuous by default bounds optimization on [0,1].)


What I also observed is that the variances are unequal, as you report, 
but not in a particular direction. That is, sometimes sig_sub has a 
higher variance, and sometimes sig does.


If I had to guess, I would hazard that perhaps the distribution of 
lambda (even after you make it symmetric) is leptokurtic which is why 
var.test is significant due to idiosyncratic differences in the variance 
of lambda, even though the values were obtained by the same process. 
(Invariably lamb_sub and lamb fail a normality test, so this seems like 
a distinct possibility.)


All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 10/20/2017 1:32 AM, William Gelnaw wrote:

Hi all,
  I've been investigating the use of Pagel's lambda and run into a problem
comparing trees that have a low speciation rate and no extinction with
trees that have a high speciation rate and lots of extinction.   I'm
comparing phylogenetic signal in two characters.  In the first, I simulate
a character under Brownian evolution over a 200 taxon tree and then prune
it down to 100 taxa.  The second character was evolved over the pruned-down
tree.  I then estimated Pagel's lambda for both characters on the
pruned-down tree.  Doing this 500 times, I found that the error rate in the
estimate of lambda was not the same for the two methods of generating the
data.  The difference was small but statistically significant.  To my
understanding though, there shouldn't be a difference.  The rate of
evolution is the same in both cases and the tree topology and branch
lengths are identical.  If the characters are evolving in a Brownian way,
shouldn't there be, on average, no difference between the characters with
respect to phylogenetic signal?  Could this be an artifact of using fastBM
to simulate the characters?  Is there another function that I should be
using to simulate a character evolving under Brownian motion?

Here's the script I'm using:
library(phytools)
library(TreeSim)
library(adephylo)
age<-2
lambda <- 2.0
mu <- 0.5
frac <-0.6
sample_size<-100
smalltrees<-list()
Bigtrees<-sim.bd.taxa.age(200,500,lambda,mu,frac,age)
## using sim.bd.taxa.age to get an ultrametric tree with known age
subtrees<-list()
sig<-vector()
sig_sub<-vector()
for (i in 1:500){
var<-fastBM(Bigtrees[[i]],nsim=1)
tips_to_drop<-sample(1:200, 200-sample_size, replace=FALSE)
subtrees[[i]]<-drop.tip(Bigtrees[[i]], tips_to_drop)
var1<-fastBM(subtrees[[i]],nsim=1)
sig_sub[i]<-phylosig(subtrees[[i]],var,method="lambda")$lambda
sig[i]<-phylosig(subtrees[[i]],var1,method="lambda")$lambda}
lamb<-(2-sig)
## reflected the data around lambda=1 so that standard deviation has a
symmetric distribution
lamb<-c(lamb,sig)
lamb_sub<-(2-sig_sub)
lamb_sub<-c(lamb_sub,sig_sub)
var.test(lamb,lamb_sub)

Best regards,
  - Will Gelnaw

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Why does ace set a random number generator seed?

2017-10-20 Thread David Bapst
Thanks, Emmanuel. I was not familiar with .Random.seed objects until R
CHECK started complaining about such elements in saved workspaces
within .rdata files within packages (requiring rm to remove them), and
then was mainly familiar with them regarding set.seed() (and so I
thought ace must be setting a seed, not just initializing one). It
makes sense that any random number generation would incur a seed, and
apparently fresh R sessions do not immediately initialize a seed.

Thanks to everyone who helped me see my confusion! Just me missing a
basic comp sci concept.
-Dave

On Fri, Oct 20, 2017 at 4:26 AM, Emmanuel Paradis
 wrote:
> Hi Dave,
>
> The seed is created every time you generate random data (see details in
> ?.Random.seed):
>
> R> exists(".Random.seed")
> [1] FALSE
> R> sample(1)
> [1] 1
> R> exists(".Random.seed")
> [1] TRUE
>
> So in your code below the seed is created by calling rtree(). Indeed, with
> ape 4.1:
>
> R> rm(".Random.seed")
> R> data(bird.orders)
> R> x <- 1:23
> R> exists(".Random.seed")
> [1] FALSE
> R> o <- ace(x, bird.orders)
> R> exists(".Random.seed")
> [1] FALSE
>
> However, with the new "soon-to-be-on-CRAN" version, a seed is created by the
> same code. The "culprit" is actually Rcpp: the new ape has C++ code linked
> to Rcpp. For a reason I ignore, it seems that a random seed is initialized
> when calling a function linked to Rcpp. Here is an example:
>
> R> rm(".Random.seed")
> R> library(RcppEigen)
> R> exists(".Random.seed")
> [1] FALSE
> R> o <- fastLm(1, 1)
> R> exists(".Random.seed")
> [1] TRUE
>
> The new C++ code in ape is called by reorder(phy, "postorder") which is used
> by many functions in ape (ace, pic, plot.phylo, vcv, ...).
>
> Best,
>
> Emmanuel
>
>
> Le 19/10/2017 à 21:00, David Bapst a écrit :
>>
>> Emmanuel, all-
>>
>> I noticed today that a workspace I was working with had a random
>> number seed set in it, but didn't remember setting one. Finally, I
>> discovered the culprit was ace. Here's a reproducible example,
>> demonstrating that a seed exists after running ace:
>>
>> library(ape)
>> tree<-rtree(10)
>> ace(1:10,tree)
>> .Random.seed
>>
>> I am using ape 4.1, and it doesn't seem to be addressed in the
>> forthcoming version, given my reading of the changes log (but I might
>> have missed it). What's going? Why is a random number seed being set?
>>
>> Cheers,
>> -Dave
>>
>



-- 
David W. Bapst, PhD
Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville
Lecturer, Geology & Geophysics, Texas A & M University
https://github.com/dwbapst/paleotree
Google Calender: https://goo.gl/EpiM4J

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/