[R-sig-phylo] tip order in phylogeny - How does it work?

2014-10-08 Thread Juan Antonio Balbuena

  
  
Hello 
I am working on a simulation that requires adding a number of random
tips to multiple phylogenies:

treeP - read.nexus(file= "my file.tre") # 5K trees used to build
a consensus tree (treeP is thus a multiPhylo object) 
NT = 30 # number of taxa in phylogeny
N.dup = 3 # number of tips (duplications) I want to add to the trees
sites - sort(sample(1:NT, N.dup), decreasing=TRUE) # Random tips
to duplicate
dup.labels - treeP[[1]]$tip.label[sites]
treeP - .uncompressTipLabel(treeP)
#
# Add duplicates to branches following a Liam Revell recipe
published in his blog:
#
for (n in 1:length(treeP)) { 
  for (i in 1:N.dup ) {
    pos - runif(n=1)*
treeP[[n]]$edge.length[which(treeP[[n]]$edge[,2]==sites[i])]
    edl - runif(n=1)*
treeP[[n]]$edge.length[which(treeP[[n]]$edge[,2]==sites[i])]
    treeP[[n]] - bind.tip(treeP[[n]],
tip.label=paste(dup.labels[i], 1, sep="."),
   edge.length=edl, where=sites[i],
position=pos)
  }
}
#
treeP- .compressTipLabel(treeP)
# Check tip labels:
treeP$tip.label
$tip.label
 [1] "S24"   "S25"   "S9"    "S15"   "S3"    "S21"   "S12"   "S17"  
"S19"   "S30"   "S22"   "S4"    "S16"   "S27"   "S10"   "S23"  
"S1"    "S2"    "S14"  
[20] "S6"    "S8"    "S26"   "S29"   "S13"   "S20"   "S7"    "S28"  
"S18"   "S5"    "S11"   "S18.1" "S20.1" "S16.1"

When I plot any tree in treeP, S18.1, S20.1 and S16.1 appear
together with S18, S20 and S16, as intended. So why are they at the
end of the array above? This behaviour makes makes my life difficult
because it forces me to rearrange a number of matrices to make them
match this tip label order. I would appreciate if a kind soul can
explain why it is that way and whether is possible to reorder the
tip labels, placing each added tip together with its respective
counterpart? 

Thanks in advance for your help.

Juan Antonio
-- 
  
  

  Dr.
Juan
A. Balbuena 
Cavanilles Institute of Biodiversity and Evolutionary
Biology 
University of
Valencia           
           
           
        http://www.uv.es/~balbuena

P.O. Box
22085           
           
           
           
    http://www.uv.es/cophylpaco
46071 Valencia,
Spain

e-mail: j.a.balbu...@uv.es   
tel.
+34 963 543 658    fax +34 963 543 733 
 
NOTE! For shipments by EXPRESS COURIER use
  the following street
  address: 
  C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia),
  Spain. 
  
  

  


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] tip order in phylogeny - How does it work?

2014-10-08 Thread Brian O'Meara
Look at reorder.phylo() in ape and reorder() in phylo4: your desired tip
order may be there.

Best,
Brian

___
Brian O'Meara
Assistant Professor
Dept. of Ecology  Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Postdoc collaborators wanted: http://nimbios.org/postdocs/
Calendar: http://www.brianomeara.info/calendars/omeara

On Wed, Oct 8, 2014 at 4:45 AM, Juan Antonio Balbuena j.a.balbu...@uv.es
wrote:

  Hello
 I am working on a simulation that requires adding a number of random tips
 to multiple phylogenies:

 treeP - read.nexus(file= my file.tre) # 5K trees used to build a
 consensus tree (treeP is thus a multiPhylo object)
 NT = 30 # number of taxa in phylogeny
 N.dup = 3 # number of tips (duplications) I want to add to the trees
 sites - sort(sample(1:NT, N.dup), decreasing=TRUE) # Random tips to
 duplicate
 dup.labels - treeP[[1]]$tip.label[sites]
 treeP - .uncompressTipLabel(treeP)
 #
 # Add duplicates to branches following a Liam Revell recipe published in
 his blog:
 #
 for (n in 1:length(treeP)) {
   for (i in 1:N.dup ) {
 pos - runif(n=1)*
 treeP[[n]]$edge.length[which(treeP[[n]]$edge[,2]==sites[i])]
 edl - runif(n=1)*
 treeP[[n]]$edge.length[which(treeP[[n]]$edge[,2]==sites[i])]
 treeP[[n]] - bind.tip(treeP[[n]], tip.label=paste(dup.labels[i], 1,
 sep=.),
edge.length=edl, where=sites[i], position=pos)
   }
 }
 #
 treeP- .compressTipLabel(treeP)
 # Check tip labels:
 treeP$tip.label
 $tip.label
  [1] S24   S25   S9S15   S3S21   S12   S17
 S19   S30   S22   S4S16   S27   S10   S23   S1
 S2S14
 [20] S6S8S26   S29   S13   S20   S7S28
 S18   S5S11   S18.1 S20.1 S16.1

 When I plot any tree in treeP, S18.1, S20.1 and S16.1 appear together with
 S18, S20 and S16, as intended. So why are they at the end of the array
 above? This behaviour makes makes my life difficult because it forces me to
 rearrange a number of matrices to make them match this tip label order. I
 would appreciate if a kind soul can explain why it is that way and whether
 is possible to reorder the tip labels, placing each added tip together with
 its respective counterpart?

 Thanks in advance for your help.

 Juan Antonio
 --

 Dr. Juan A. Balbuena
 Cavanilles Institute of Biodiversity and Evolutionary Biology
 University of Valencia
 http://www.uv.es/~balbuena
 P.O. Box 22085
 http://www.uv.es/cophylpaco
 http://www.uv.es/cavanilles/zoomarin/index.htm
 46071 Valencia, Spain
 e-mail: j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733
 
 *NOTE!* For shipments by EXPRESS COURIER use the following street address:
 C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
 

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

2014-10-08 Thread David Bapst
Hello all,

Unable to get phyloch's read.mrbayes() to read my .con.tre files (for
reasons unknown), I've gone with Graham's suggestion to use the simple
output format from sumt. However, as I have a large number of analyses
which took considerable computational time (and I am stubborn to learn
how to run these on a cluster) I decided the more economical thing to
do was to write an R script that took the existing nexus files and
wrote a nexus file with MrBayes blocks that would output 'simple'
format consensus trees. I realize such string manipulation might be
easier in some other language, but I know R best.

Here it is, in case anyone else ever needs such a thing:

_

#change to dir for MB files
setwd(C:/dave/workspace/mrbayes)

#get files
files-list.files()
#get tree files
treeFiles-files[grep(x=files,pattern=run..t)]
#get the nexus files
inputs-sort(unique(sub(.run..t,,treeFiles)))
inputs-sort(unique(sub(.tree.,,inputs)))

#now start constructing sumt lines
bayesBlock-#NEXUS
for(i in 1:length(inputs)){
origNex-scan(inputs[i],what=character,sep=\n,blank.lines.skip=FALSE)
cropNex-origNex[2:grep(origNex,pattern=mcmcp)]
#remove log line
cropNex-cropNex[-grep(cropNex,pattern=log start)]
sumtLine-paste0(sumt filename=,inputs[i], outputname=,
paste0(inputs[i],.simple), conformat=simple;)
bayesBlock-c(bayesBlock,cropNex,sumtLine,,end;,)
}

write(bayesBlock,file=simple_sumt.nex)

___

Some caveats: this script assumes you already have mrbayes blocks in
your nexus files (all of which are stored with all output in the same
directory), including a line where you set MCMC parameters (mcmcp)
just prior to calling the MCMC itself and after any other
modifications to the dataset, model or prior settings.

If you execute the nexus file created, it should run sumt and output
new output files with 'simple' in the file name. Analyses where there
are multiple trees (such as partitioned analyses with unlinked branch
lengths) will have multiple consensus trees estimated for them, as
expected if we had originally done this immediately after running
mcmc. Also note I crop out any lines that create new log files,
because I didn't want to write over my old log files.

The resulting simple-format consensus tree files can be read into ape
with read.nexus just fine. For whatever reason, these nexus files
contain two trees, otherwise identical other than that one contains
the posterior probabilities as its node-labels.

However! There is a big however here.

If we re-root the tree in ape (and there are many reasons why we might
want to do so), the posterior probabilities, which are stored as node
labels, no longer make sense as those node labels are not properties
technically of the nodes but of the edges preceding those nodes.
Change the directionality of the graph by rerooting and those
posterior probabilities end up in the wrong place. (TreeFig apparently
figures this out on its own, automatically, if your reroot in
TreeFig.)

I haven't found any option in an R package that slides node-labels
around as if they were support values when the tree is rerooted (and
if there is one, please let me know!) so I had to write my own thing
for doing that to the newly created 'simple' consensus tree files...

So here's another script.

_

#change dir
setwd(C:/dave/workspace/mrbayes)

library(ape)

files-list.files()
#get consensus nexus files (ugh regexp)
conFiles-files[grep(x=files,pattern=.con.tre)]
#get only 'simple' files
conFiles-conFiles[grep(x=conFiles,pattern=simple)]
conName-sub(.nex.simple,,conFiles)
conName-sub(.con.tre,,conName)

trees-list()
for(i in 1:length(conFiles)){
treesRead-read.nexus(conFiles[i])
treeA-treesRead[[2]]
treeB-treesRead[[1]]
#now we're going to reroot and let's use fake species names hahah
treeA-root(treeA,resolve.root=TRUE,outgroup=c(OutOfTheGroup,OutThere,
outgroupy,groupout))
#okay basically need to figure out
#which taxa are included by a node
#and match between treeB and tree A
splitsB-lapply(prop.part(treeB),function(x)
if(any(x==1)){treeB$tip.label[x]
}else{treeB$tip.label[-x]})
splitsA-lapply(prop.part(treeA),function(x)
if(any(x==1)){treeA$tip.label[x]
}else{treeA$tip.label[-x]})
#plot(treeB,show.node.label=TRUE)
treeA$node.label-treeB$node.label[match(splitsA,splitsB)]
trees[[i]]-treeA
}
class(trees)-multiPhylo

#plot(trees)

pdf(consensus_trees_10-07-14.pdf,height=8,width=8)
for(i in 1:length(trees)){
plot(trees[[i]],main=conName[i],show.node.label=TRUE)
add.scale.bar()
}
dev.off()

_

The last bit just plots it in a pdf, which isn't important at all but
I included it for the heck of it. The important stuff is the innards
of the first for loop, which reads in 

Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

2014-10-08 Thread Klaus Schliep
Hi all,

here is a small function which it is very similar to David's 2nd script,
but it is a bit more compact:
library(phangorn) # requires the newest pangorn version = 1.99-8
addConfidences.phylo - function(to, from){
conf = attr(addConfidences(as.splits(to), from), confidences)
nTips = length(to$tip.label)
to$node.label = conf[-c(1:nTips)]
to
}
treeA - addConfidences.phylo(treeA, treeB)

I will fix also my midpoint rooting function.

And some comments on possible problems / features of the 2nd script:
The function assumes that the ordering of the tip labels is identical with
the effect

t3 t4

is different to

t4 t3

A lapply(splitsA, sort) and lapply(splitsB, sort) should do the trick.
Now the feature:
let assume there a 5 taxa  t1 t2 t3 t4 t5
A clade containing t1 t2
will be different from t3 t4 t5.
If you want just to reroot a tree, you probably do not want to distinguish
clades but only splits.

Regards,
Klaus


On Wed, Oct 8, 2014 at 10:51 AM, David Bapst dwba...@gmail.com wrote:

 Hello all,

 Unable to get phyloch's read.mrbayes() to read my .con.tre files (for
 reasons unknown), I've gone with Graham's suggestion to use the simple
 output format from sumt. However, as I have a large number of analyses
 which took considerable computational time (and I am stubborn to learn
 how to run these on a cluster) I decided the more economical thing to
 do was to write an R script that took the existing nexus files and
 wrote a nexus file with MrBayes blocks that would output 'simple'
 format consensus trees. I realize such string manipulation might be
 easier in some other language, but I know R best.

 Here it is, in case anyone else ever needs such a thing:

 _

 #change to dir for MB files
 setwd(C:/dave/workspace/mrbayes)

 #get files
 files-list.files()
 #get tree files
 treeFiles-files[grep(x=files,pattern=run..t)]
 #get the nexus files
 inputs-sort(unique(sub(.run..t,,treeFiles)))
 inputs-sort(unique(sub(.tree.,,inputs)))

 #now start constructing sumt lines
 bayesBlock-#NEXUS
 for(i in 1:length(inputs)){

 origNex-scan(inputs[i],what=character,sep=\n,blank.lines.skip=FALSE)
 cropNex-origNex[2:grep(origNex,pattern=mcmcp)]
 #remove log line
 cropNex-cropNex[-grep(cropNex,pattern=log start)]
 sumtLine-paste0(sumt filename=,inputs[i], outputname=,
 paste0(inputs[i],.simple), conformat=simple;)
 bayesBlock-c(bayesBlock,cropNex,sumtLine,,end;,)
 }

 write(bayesBlock,file=simple_sumt.nex)

 ___

 Some caveats: this script assumes you already have mrbayes blocks in
 your nexus files (all of which are stored with all output in the same
 directory), including a line where you set MCMC parameters (mcmcp)
 just prior to calling the MCMC itself and after any other
 modifications to the dataset, model or prior settings.

 If you execute the nexus file created, it should run sumt and output
 new output files with 'simple' in the file name. Analyses where there
 are multiple trees (such as partitioned analyses with unlinked branch
 lengths) will have multiple consensus trees estimated for them, as
 expected if we had originally done this immediately after running
 mcmc. Also note I crop out any lines that create new log files,
 because I didn't want to write over my old log files.

 The resulting simple-format consensus tree files can be read into ape
 with read.nexus just fine. For whatever reason, these nexus files
 contain two trees, otherwise identical other than that one contains
 the posterior probabilities as its node-labels.

 However! There is a big however here.

 If we re-root the tree in ape (and there are many reasons why we might
 want to do so), the posterior probabilities, which are stored as node
 labels, no longer make sense as those node labels are not properties
 technically of the nodes but of the edges preceding those nodes.
 Change the directionality of the graph by rerooting and those
 posterior probabilities end up in the wrong place. (TreeFig apparently
 figures this out on its own, automatically, if your reroot in
 TreeFig.)

 I haven't found any option in an R package that slides node-labels
 around as if they were support values when the tree is rerooted (and
 if there is one, please let me know!) so I had to write my own thing
 for doing that to the newly created 'simple' consensus tree files...

 So here's another script.

 _

 #change dir
 setwd(C:/dave/workspace/mrbayes)

 library(ape)

 files-list.files()
 #get consensus nexus files (ugh regexp)
 conFiles-files[grep(x=files,pattern=.con.tre)]
 #get only 'simple' files
 conFiles-conFiles[grep(x=conFiles,pattern=simple)]
 conName-sub(.nex.simple,,conFiles)
 conName-sub(.con.tre,,conName)

 trees-list()
 for(i in 1:length(conFiles)){
 treesRead-read.nexus(conFiles[i])
 treeA-treesRead[[2]]
 treeB-treesRead[[1]]
 #now we're going to