Re: [R-sig-phylo] Newick tree merging at the wrong level?

2016-11-02 Thread Liam J. Revell
Excellent. I have added those function to phytools on GitHub, so they 
can be installed using devtools as follows:


library(devtools)
install_github("liamrevell/phytools")

Glad it was of help. - Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 11/2/2016 8:00 PM, Carlos Porto Filho wrote:

Thank you very much! That actually solved my problem.

I couldn't express myself clearly but that was what I wanted.. that
nodes with the same maximum path length to the tips should have the same
height..
From a layman perspective that way to compute branch lengths is actually
more logic for a tree representation format like newick (think of the
number of outside parenthesis..)

Thank you again!

2016-11-02 14:20 GMT-02:00 Liam J. Revell >:


Hi Carlos.

After thinking about this, I realized that what you probably want is

not a node depth based on the number of nodes above that node - but the
based on the maximum number of nodes separating that node from any leaf.


I just posted a second solution for this on my blog here:

http://blog.phytools.org/2016/11/modified-version-of-grafens-branch.html.


Let me know if this is helpful.

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu 
blog: http://blog.phytools.org

On 11/1/2016 10:32 PM, Liam J. Revell wrote:


Hi Carlos.

The edge lengths used for plotting (and thus the 'merge depths') are
completely arbitrary when none are supplied. The algorithm used to
compute the edge lengths is in the documentation for compute.brlen as
follows:

"Grafen's (1989) computation of branch lengths: each node is given a
‘height’, namely the number of leaves of the subtree minus one, 0 for
leaves. Each height is scaled so that root height is 1, and then raised
at power 'rho' (> 0). Branch lengths are then computed as the difference
between height of lower node and height of upper node."

If you want to set it so that it is the number of descendant nodes,
rather than leaves, that determines the height of each internal node,
you could try something like the following:

tree<-newickTree
ndn<-function(tree,x){
dd<-c(Descendants(tree,x,"all"),x)
sum(dd>Ntip(tree))
}
nn<-1:(Ntip(tree)+tree$Nnode)
h<-sapply(nn,ndn,tree=tree)
edge.length<-vector()
for(i in 1:nrow(tree$edge)) edge.length[i]<-diff(h[tree$edge[i,2:1]])
tree$edge.length<-edge.length
plot(tree)

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu 
blog: http://blog.phytools.org

On 11/1/2016 10:05 PM, Carlos Porto Filho wrote:


Dear list,

TL;DR: When I plot the newick tree A,B),C,D),E),(((F,G),H),I));
shouldn't ((A,B),C,D) merge at the same level as ((F,G),H)?
Or in a phylogenetic tree it doesn't matter?

library(ape)
newickTree <- read.tree(text="A,B),C),E),(((F,G),H),I));")
plot(newickTree)
goes just fine but then, when I add D:
newickTree <- read.tree(text="A,B),C,D),E),(((F,G),H),I));")
plot(newickTree)
The merge between (A,B),C,D goes one level up (and consequently E too).

Very long version: I'm developing my own hierarchical clustering
algorithm
that outputs a matrix like the one bellow:

  | Level 1 | Level 2 |  Level 3   |Level 4 | Level 5

|

Level 6


--


1 | 1 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
2 | 2 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
3 | 3 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(1, 2, 5,
7, 8)
| 1:10
4 | 4 | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6,
7, 8)
| 1:10
5 | 5 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
6 | 6 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5,
7, 8)
| 1:10
7 | 7 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5,
7, 8)
| 1:10
8 | 8 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
9 | 9 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
10| 10  | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6, 7,
8) |
1:10

I want to plot a dendrogram based on this and the best idea I had was to
convert that matrix to a newick tree format and use ape to read it and
plot
as a dendrogram.
So the corresponding newick tree would be
newickTree <-
read.tree(text="(5,8),1,2),7),6),(((4,9),3),10));")
plot(newickTree)
But the levels are wrong.. I was expecting this:
newickTree <-


read.tree(text="(5:1,8:1):1,1:2,2:2):1,7:3):1,6:4):1,(((4:1,9:1):1,3:2):1,10:3):2);")




Re: [R-sig-phylo] Newick tree merging at the wrong level?

2016-11-02 Thread Carlos Porto Filho
Thank you very much! That actually solved my problem.

I couldn't express myself clearly but that was what I wanted.. that nodes
with the same maximum path length to the tips should have the same height..
>From a layman perspective that way to compute branch lengths is actually
more logic for a tree representation format like newick (think of the
number of outside parenthesis..)

Thank you again!

2016-11-02 14:20 GMT-02:00 Liam J. Revell :
>
> Hi Carlos.
>
> After thinking about this, I realized that what you probably want is not
a node depth based on the number of nodes above that node - but the based
on the maximum number of nodes separating that node from any leaf.
>
> I just posted a second solution for this on my blog here:
http://blog.phytools.org/2016/11/modified-version-of-grafens-branch.html.
>
> Let me know if this is helpful.
>
> All the best, Liam
>
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
>
> On 11/1/2016 10:32 PM, Liam J. Revell wrote:
>>
>> Hi Carlos.
>>
>> The edge lengths used for plotting (and thus the 'merge depths') are
>> completely arbitrary when none are supplied. The algorithm used to
>> compute the edge lengths is in the documentation for compute.brlen as
>> follows:
>>
>> "Grafen's (1989) computation of branch lengths: each node is given a
>> ‘height’, namely the number of leaves of the subtree minus one, 0 for
>> leaves. Each height is scaled so that root height is 1, and then raised
>> at power 'rho' (> 0). Branch lengths are then computed as the difference
>> between height of lower node and height of upper node."
>>
>> If you want to set it so that it is the number of descendant nodes,
>> rather than leaves, that determines the height of each internal node,
>> you could try something like the following:
>>
>> tree<-newickTree
>> ndn<-function(tree,x){
>> dd<-c(Descendants(tree,x,"all"),x)
>> sum(dd>Ntip(tree))
>> }
>> nn<-1:(Ntip(tree)+tree$Nnode)
>> h<-sapply(nn,ndn,tree=tree)
>> edge.length<-vector()
>> for(i in 1:nrow(tree$edge)) edge.length[i]<-diff(h[tree$edge[i,2:1]])
>> tree$edge.length<-edge.length
>> plot(tree)
>>
>> All the best, Liam
>>
>> Liam J. Revell, Associate Professor of Biology
>> University of Massachusetts Boston
>> web: http://faculty.umb.edu/liam.revell/
>> email: liam.rev...@umb.edu
>> blog: http://blog.phytools.org
>>
>> On 11/1/2016 10:05 PM, Carlos Porto Filho wrote:
>>>
>>> Dear list,
>>>
>>> TL;DR: When I plot the newick tree A,B),C,D),E),(((F,G),H),I));
>>> shouldn't ((A,B),C,D) merge at the same level as ((F,G),H)?
>>> Or in a phylogenetic tree it doesn't matter?
>>>
>>> library(ape)
>>> newickTree <- read.tree(text="A,B),C),E),(((F,G),H),I));")
>>> plot(newickTree)
>>> goes just fine but then, when I add D:
>>> newickTree <- read.tree(text="A,B),C,D),E),(((F,G),H),I));")
>>> plot(newickTree)
>>> The merge between (A,B),C,D goes one level up (and consequently E too).
>>>
>>> Very long version: I'm developing my own hierarchical clustering
>>> algorithm
>>> that outputs a matrix like the one bellow:
>>>
>>>   | Level 1 | Level 2 |  Level 3   |Level 4 | Level 5
|
>>> Level 6
>>>
--
>>>
>>> 1 | 1 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
>>> 7, 8)
>>> | 1:10
>>> 2 | 2 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
>>> 7, 8)
>>> | 1:10
>>> 3 | 3 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(1, 2, 5,
>>> 7, 8)
>>> | 1:10
>>> 4 | 4 | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6,
>>> 7, 8)
>>> | 1:10
>>> 5 | 5 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
>>> 7, 8)
>>> | 1:10
>>> 6 | 6 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5,
>>> 7, 8)
>>> | 1:10
>>> 7 | 7 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5,
>>> 7, 8)
>>> | 1:10
>>> 8 | 8 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
>>> 7, 8)
>>> | 1:10
>>> 9 | 9 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(2, 5, 6,
>>> 7, 8)
>>> | 1:10
>>> 10| 10  | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6, 7,
>>> 8) |
>>> 1:10
>>>
>>> I want to plot a dendrogram based on this and the best idea I had was to
>>> convert that matrix to a newick tree format and use ape to read it and
>>> plot
>>> as a dendrogram.
>>> So the corresponding newick tree would be
>>> newickTree <-
>>> read.tree(text="(5,8),1,2),7),6),(((4,9),3),10));")
>>> plot(newickTree)
>>> But the levels are wrong.. I was expecting this:
>>> newickTree <-
>>>
read.tree(text="(5:1,8:1):1,1:2,2:2):1,7:3):1,6:4):1,(((4:1,9:1):1,3:2):1,10:3):2);")
>>>
>>>
>>> plot(newickTree)
>>>
>>> I know that I can just specify the heights and solve my problem but that
>>> would make my code more complex. One idea I 

Re: [R-sig-phylo] Newick tree merging at the wrong level?

2016-11-02 Thread Liam J. Revell

Hi Carlos.

After thinking about this, I realized that what you probably want is not 
a node depth based on the number of nodes above that node - but the 
based on the maximum number of nodes separating that node from any leaf.


I just posted a second solution for this on my blog here: 
http://blog.phytools.org/2016/11/modified-version-of-grafens-branch.html.


Let me know if this is helpful.

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 11/1/2016 10:32 PM, Liam J. Revell wrote:

Hi Carlos.

The edge lengths used for plotting (and thus the 'merge depths') are
completely arbitrary when none are supplied. The algorithm used to
compute the edge lengths is in the documentation for compute.brlen as
follows:

"Grafen's (1989) computation of branch lengths: each node is given a
‘height’, namely the number of leaves of the subtree minus one, 0 for
leaves. Each height is scaled so that root height is 1, and then raised
at power 'rho' (> 0). Branch lengths are then computed as the difference
between height of lower node and height of upper node."

If you want to set it so that it is the number of descendant nodes,
rather than leaves, that determines the height of each internal node,
you could try something like the following:

tree<-newickTree
ndn<-function(tree,x){
dd<-c(Descendants(tree,x,"all"),x)
sum(dd>Ntip(tree))
}
nn<-1:(Ntip(tree)+tree$Nnode)
h<-sapply(nn,ndn,tree=tree)
edge.length<-vector()
for(i in 1:nrow(tree$edge)) edge.length[i]<-diff(h[tree$edge[i,2:1]])
tree$edge.length<-edge.length
plot(tree)

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 11/1/2016 10:05 PM, Carlos Porto Filho wrote:

Dear list,

TL;DR: When I plot the newick tree A,B),C,D),E),(((F,G),H),I));
shouldn't ((A,B),C,D) merge at the same level as ((F,G),H)?
Or in a phylogenetic tree it doesn't matter?

library(ape)
newickTree <- read.tree(text="A,B),C),E),(((F,G),H),I));")
plot(newickTree)
goes just fine but then, when I add D:
newickTree <- read.tree(text="A,B),C,D),E),(((F,G),H),I));")
plot(newickTree)
The merge between (A,B),C,D goes one level up (and consequently E too).

Very long version: I'm developing my own hierarchical clustering
algorithm
that outputs a matrix like the one bellow:

  | Level 1 | Level 2 |  Level 3   |Level 4 | Level 5   |
Level 6
--

1 | 1 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
2 | 2 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
3 | 3 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(1, 2, 5,
7, 8)
| 1:10
4 | 4 | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6,
7, 8)
| 1:10
5 | 5 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
6 | 6 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5,
7, 8)
| 1:10
7 | 7 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5,
7, 8)
| 1:10
8 | 8 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
9 | 9 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(2, 5, 6,
7, 8)
| 1:10
10| 10  | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6, 7,
8) |
1:10

I want to plot a dendrogram based on this and the best idea I had was to
convert that matrix to a newick tree format and use ape to read it and
plot
as a dendrogram.
So the corresponding newick tree would be
newickTree <-
read.tree(text="(5,8),1,2),7),6),(((4,9),3),10));")
plot(newickTree)
But the levels are wrong.. I was expecting this:
newickTree <-
read.tree(text="(5:1,8:1):1,1:2,2:2):1,7:3):1,6:4):1,(((4:1,9:1):1,3:2):1,10:3):2);")


plot(newickTree)

I know that I can just specify the heights and solve my problem but that
would make my code more complex. One idea I had reading the last thread
(Making ultrametric trees) was to make every height = 1 and read it with
chronos()
newickTree <-
read.tree(text="A:1,B:1):1,C:1,D:1):1,E:1):1,(((F:1,G:1):1,H:1):1,I:1):1);")


dendr <- chronos(newickTree)
plot(dendr)
but it doesn't look right..

Sorry for the newbie questions.. I just wanted a way to make my algorithm
output a dendrogram.

Thanks in advance.



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Newick tree merging at the wrong level?

2016-11-01 Thread Liam J. Revell

Hi Carlos.

The edge lengths used for plotting (and thus the 'merge depths') are 
completely arbitrary when none are supplied. The algorithm used to 
compute the edge lengths is in the documentation for compute.brlen as 
follows:


"Grafen's (1989) computation of branch lengths: each node is given a 
‘height’, namely the number of leaves of the subtree minus one, 0 for 
leaves. Each height is scaled so that root height is 1, and then raised 
at power 'rho' (> 0). Branch lengths are then computed as the difference 
between height of lower node and height of upper node."


If you want to set it so that it is the number of descendant nodes, 
rather than leaves, that determines the height of each internal node, 
you could try something like the following:


tree<-newickTree
ndn<-function(tree,x){
dd<-c(Descendants(tree,x,"all"),x)
sum(dd>Ntip(tree))
}
nn<-1:(Ntip(tree)+tree$Nnode)
h<-sapply(nn,ndn,tree=tree)
edge.length<-vector()
for(i in 1:nrow(tree$edge)) edge.length[i]<-diff(h[tree$edge[i,2:1]])
tree$edge.length<-edge.length
plot(tree)

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 11/1/2016 10:05 PM, Carlos Porto Filho wrote:

Dear list,

TL;DR: When I plot the newick tree A,B),C,D),E),(((F,G),H),I));
shouldn't ((A,B),C,D) merge at the same level as ((F,G),H)?
Or in a phylogenetic tree it doesn't matter?

library(ape)
newickTree <- read.tree(text="A,B),C),E),(((F,G),H),I));")
plot(newickTree)
goes just fine but then, when I add D:
newickTree <- read.tree(text="A,B),C,D),E),(((F,G),H),I));")
plot(newickTree)
The merge between (A,B),C,D goes one level up (and consequently E too).

Very long version: I'm developing my own hierarchical clustering algorithm
that outputs a matrix like the one bellow:

  | Level 1 | Level 2 |  Level 3   |Level 4 | Level 5   |
Level 6
--
1 | 1 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6, 7, 8)
| 1:10
2 | 2 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6, 7, 8)
| 1:10
3 | 3 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(1, 2, 5, 7, 8)
| 1:10
4 | 4 | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6, 7, 8)
| 1:10
5 | 5 | c(5,8)   | c(1, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6, 7, 8)
| 1:10
6 | 6 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5, 7, 8)
| 1:10
7 | 7 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(1, 2, 5, 7, 8)
| 1:10
8 | 8 | c(5,8)   | c(2, 5, 8)  | c(2, 5, 7, 8)  | c(2, 5, 6, 7, 8)
| 1:10
9 | 9 | c(4,9)   | c(3, 4, 9)  | c(2, 5, 7, 8)  | c(2, 5, 6, 7, 8)
| 1:10
10| 10  | c(4,9)   | c(3, 4, 9)  | c(3, 4, 9, 10) | c(2, 5, 6, 7, 8) |
1:10

I want to plot a dendrogram based on this and the best idea I had was to
convert that matrix to a newick tree format and use ape to read it and plot
as a dendrogram.
So the corresponding newick tree would be
newickTree <-
read.tree(text="(5,8),1,2),7),6),(((4,9),3),10));")
plot(newickTree)
But the levels are wrong.. I was expecting this:
newickTree <-
read.tree(text="(5:1,8:1):1,1:2,2:2):1,7:3):1,6:4):1,(((4:1,9:1):1,3:2):1,10:3):2);")

plot(newickTree)

I know that I can just specify the heights and solve my problem but that
would make my code more complex. One idea I had reading the last thread
(Making ultrametric trees) was to make every height = 1 and read it with
chronos()
newickTree <-
read.tree(text="A:1,B:1):1,C:1,D:1):1,E:1):1,(((F:1,G:1):1,H:1):1,I:1):1);")

dendr <- chronos(newickTree)
plot(dendr)
but it doesn't look right..

Sorry for the newbie questions.. I just wanted a way to make my algorithm
output a dendrogram.

Thanks in advance.



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/