Dear R-sig-phylo,

Over the weekend, I asked Liam Revell if he had a solution to use matchNodes 
for a particular problem I’m trying to solve—finding all phylogenetically 
equivalent nodes when comparing trees that have uneven taxon samples and 
different topologies. Liam was kind enough to take some time to write a blog 
post about this, and got me started with some code

http://blog.phytools.org/2021/02/on-matching-nodes-between-trees-using.html

On it’s face this seems like a simple problem, but I’m running into some issues 
and thought I would reach out to the broader group. The code linked above seems 
to work, but only for comparing trees that start out as topologically 
identical. For my purposes, I’m trying to match nodes from a given a reference, 
to nodes in and across several hundred gene trees that differ in topology and 
taxon sample relative to the reference.

Here is a function definition based on Liam’s example

#function to match nodes from consensus 
#to individual gene trees with uneven sampling
#derived from Liam Revell's example-- need to 
testmatch_phylo_nodes<-function(t1, t2){
  ## step one drop tips
  t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label))
  t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label))
  
  ## step two match nodes "descendants"
  M<-matchNodes(t1p,t2p)
  
  ## step two match nodes "distances"
  M1<-matchNodes(t1,t1p,"distances")
  M2<-matchNodes(t2,t2p,"distances")
  
  ## final step, reconcile
  MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right")))
  
  for(i in 1:nrow(MM)){
    MM[i,1]<-M1[i,1]
    nn<-M[which(M[,1]==M1[i,2]),2]
    if(length(nn)>0){   
        MM[i,2]<-M2[which(M2[,2]==nn),1]
    }   
  }
  return(MM)    
}


When t1 and t2 are trees that have topological conflicts, this function returns 
an error: 

Error in MM[i, 2] <- M2[which(M2[, 2] == nn), 1] : 
  replacement has length zero

I think(?) this happens because a particular node doesn’t exist in one or the 
other trees, and it returns integer(0) at that line — but I’m not sure I really 
understand what is going on here.


I modified Liam’s code slightly to get it to run without error in the described 
case, by making it conditional on that particular line:


Modified version

#function to match nodes from consensus 
#to individual gene trees with uneven sampling
#derived from Liam Revell's example-- need to test
match_phylo_nodes<-function(t1, t2){
        ## step one drop tips
        t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label))
        t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label))

        ## step two match nodes "descendants"
        M<-matchNodes(t1p,t2p)

        ## step two match nodes "distances"
        M1<-matchNodes(t1,t1p,"distances")
        M2<-matchNodes(t2,t2p,"distances")

        ## final step, reconcile
        MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right")))

        for(i in 1:nrow(MM)){
                MM[i,1]<-M1[i,1]
        nn<-M[which(M[,1]==M1[i,2]),2]
    if(length(nn)>0){   
        if(length(which(M2[,2]==nn))>0){
                MM[i,2]<-M2[which(M2[,2]==nn),1]
        }
    } else {
    }   
}
return(MM)      
}


I’ve been experimenting with this and some downstream code for the last few 
days, but I’ve run into some weird inconsistent results (not easily summarized) 
that make me think that this function is not working as intended.

I was wondering — have any of you dealt with a similar problem? In principle 
this seems like it should be similar to concordance analysis, but I care less 
about identifying the proportion of nodes that exist in gene trees given a 
reference, and instead I need the actual node numbers in a given gene tree that 
are phylogenetically equivalent to particular nodes in a reference. Happy to 
try to hack away at something… 


Best,
Jake Berv





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to