Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot

2023-10-29 Thread Russell Engelman
My apologies, I just noticed that the cumulative plot was already on the
page, I had misinterpreted what that plot meant.

Russell

On Sat, Oct 28, 2023 at 10:56 PM Russell Engelman 
wrote:

> Dear Liam,
>
> This is great, almost exactly what I needed. I will have to get our tree
> out and see if I can get it to work in R.
>
> One question, is it possible to do a cumulative tree like this (
> https://proopnarine.files.wordpress.com/2018/11/evolfaunas.png)? This is
> a good example of the graph I am trying to produce. The goal here is to
> show not only the diversity of each clade (which the current example on
> your blog does well) but also show that after a certain point all surviving
> lineages belong to these clades, and "basal" uncolored lineages are gone.
>
> Sincerely,
> Russell
>
> On Sat, Oct 28, 2023 at 9:09 PM Liam J. Revell 
> wrote:
>
>> I like Will's suggestion. Unfortunately, *phytools::ltt.simmap* method
>> in *phytools* on CRAN does not support ultrametric trees.
>>
>> I have updated it on GitHub & also posted a workflow to my blog that may
>> capture what Russell is going for (if I understand it correctly):
>> http://blog.phytools.org/2023/10/visualizing-lineage-accumulation-on.html
>> .
>>
>> All the best, Liam
>> Liam J. Revell
>> Professor of Biology, University of Massachusetts Boston
>> Web: http://faculty.umb.edu/liam.revell/
>> Book: Phylogenetic Comparative Methods in R
>> <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r>
>> (*Princeton University Press*, 2022)
>>
>>
>> On 10/28/2023 6:56 PM, William Gearty wrote:
>>
>> CAUTION: EXTERNAL SENDER
>>
>> The output from Liam's function should be pretty useable for ggplot, too. I
>> think the function has a plot argument that you can just set to False.
>>
>> On Sat, Oct 28, 2023, 6:37 PM Russell Engelman  
>> 
>> wrote:
>>
>>
>> It looks like what Liam wrote (specifically, the final figure in the post)
>> is pretty close to what I am looking for.
>>
>> Emmanuel's suggestion of using ltt.plots.coords might also work, and it
>> might be easier to input that data into ggplot if I could find some way to
>> merge the various matrices.
>>
>> Russell
>>
>> On Fri, Oct 27, 2023 at 10:04 AM William Gearty  
>> 
>> wrote:
>>
>>
>> You could also map your states (in this case, taxonomic groupings) onto
>> your tree, then make an ltt plot with phytools::ltt(). Liam has a great
>> blog post about it 
>> here:https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.phytools.org%2F2022%2F08%2Flineage-through-time-plots-for.html=05%7C01%7Cliam.revell%40umb.edu%7C9cff979c168d4a52a6b208dbd8092b8a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638341306150821485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=cCbfHtMBYBYoFjct9FzpZ78ZpUkRwrZLPH8gs68KnCM%3D=0
>>  <http://blog.phytools.org/2022/08/lineage-through-time-plots-for.html>.
>>
>> Best,
>> Will
>>
>> --
>> *William Gearty*
>> *Lerner-Gray Postdoctoral Research Fellow*
>> Division of Paleontology
>> American Museum of Natural Historywilliamgearty.com
>>
>>
>>
>> On Thu, Oct 26, 2023 at 9:55 PM Emmanuel Paradis  
>> 
>> wrote:
>>
>>
>> Hi Russell,
>>
>> There are several implementations of LTT plots among several packages,
>> so the details certainly differ depending on which one(s) you use.
>>
>> Maybe you can use the ape function ltt.plot.coords() which returns a
>> matrix with 2 columns giving the number of lineages for each node time of
>> the tree. I guess you'll need to do a bit of programming to combine the
>> different outputs from different trees.
>>
>> Best,
>>
>> Emmanuel
>>
>> - Le 27 Oct 23, à 8:05, Russell Engelman  
>> 
>> a écrit :
>>
>>
>> Dear R-Sig-Phylo,
>> I was wondering if there was any way to color-code a
>>
>> lineage-through-time plot,
>>
>> to highlight the proportion of taxa at specific intervals that belong
>>
>> to a
>>
>> particular clade. I.e., an LTT plot of tetrapod diversity through
>>
>> time, and I
>>
>> want to highlight the number of lineages at any one point in time that
>>
>> are
>>
>> chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached
>>
>> fig. as an
>>
>> example). I figured I can do this by ask

Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot

2023-10-28 Thread Russell Engelman
Dear Liam,

This is great, almost exactly what I needed. I will have to get our tree
out and see if I can get it to work in R.

One question, is it possible to do a cumulative tree like this (
https://proopnarine.files.wordpress.com/2018/11/evolfaunas.png)? This is a
good example of the graph I am trying to produce. The goal here is to show
not only the diversity of each clade (which the current example on your
blog does well) but also show that after a certain point all surviving
lineages belong to these clades, and "basal" uncolored lineages are gone.

Sincerely,
Russell

On Sat, Oct 28, 2023 at 9:09 PM Liam J. Revell  wrote:

> I like Will's suggestion. Unfortunately, *phytools::ltt.simmap* method in
> *phytools* on CRAN does not support ultrametric trees.
>
> I have updated it on GitHub & also posted a workflow to my blog that may
> capture what Russell is going for (if I understand it correctly):
> http://blog.phytools.org/2023/10/visualizing-lineage-accumulation-on.html.
>
> All the best, Liam
> Liam J. Revell
> Professor of Biology, University of Massachusetts Boston
> Web: http://faculty.umb.edu/liam.revell/
> Book: Phylogenetic Comparative Methods in R
> <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r>
> (*Princeton University Press*, 2022)
>
>
> On 10/28/2023 6:56 PM, William Gearty wrote:
>
> CAUTION: EXTERNAL SENDER
>
> The output from Liam's function should be pretty useable for ggplot, too. I
> think the function has a plot argument that you can just set to False.
>
> On Sat, Oct 28, 2023, 6:37 PM Russell Engelman  
> 
> wrote:
>
>
> It looks like what Liam wrote (specifically, the final figure in the post)
> is pretty close to what I am looking for.
>
> Emmanuel's suggestion of using ltt.plots.coords might also work, and it
> might be easier to input that data into ggplot if I could find some way to
> merge the various matrices.
>
> Russell
>
> On Fri, Oct 27, 2023 at 10:04 AM William Gearty  
> 
> wrote:
>
>
> You could also map your states (in this case, taxonomic groupings) onto
> your tree, then make an ltt plot with phytools::ltt(). Liam has a great
> blog post about it 
> here:https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.phytools.org%2F2022%2F08%2Flineage-through-time-plots-for.html=05%7C01%7Cliam.revell%40umb.edu%7C9cff979c168d4a52a6b208dbd8092b8a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638341306150821485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=cCbfHtMBYBYoFjct9FzpZ78ZpUkRwrZLPH8gs68KnCM%3D=0
>  <http://blog.phytools.org/2022/08/lineage-through-time-plots-for.html>.
>
> Best,
> Will
>
> --
> *William Gearty*
> *Lerner-Gray Postdoctoral Research Fellow*
> Division of Paleontology
> American Museum of Natural Historywilliamgearty.com
>
>
>
> On Thu, Oct 26, 2023 at 9:55 PM Emmanuel Paradis  
> 
> wrote:
>
>
> Hi Russell,
>
> There are several implementations of LTT plots among several packages,
> so the details certainly differ depending on which one(s) you use.
>
> Maybe you can use the ape function ltt.plot.coords() which returns a
> matrix with 2 columns giving the number of lineages for each node time of
> the tree. I guess you'll need to do a bit of programming to combine the
> different outputs from different trees.
>
> Best,
>
> Emmanuel
>
> - Le 27 Oct 23, à 8:05, Russell Engelman  
> 
> a écrit :
>
>
> Dear R-Sig-Phylo,
> I was wondering if there was any way to color-code a
>
> lineage-through-time plot,
>
> to highlight the proportion of taxa at specific intervals that belong
>
> to a
>
> particular clade. I.e., an LTT plot of tetrapod diversity through
>
> time, and I
>
> want to highlight the number of lineages at any one point in time that
>
> are
>
> chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached
>
> fig. as an
>
> example). I figured I can do this by asking R to return all lineages
>
> that are
>
> descendants of a specific node, but am not sure what functions I can
>
> use to
>
> convert the dated tree into an object that can be read into ggplot.
>
> Sincerely,
> Russell
>
> ___
> R-sig-phylo mailing list - 
> R-sig-phylo@r-project.orghttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo=05%7C01%7Cliam.revell%40umb.edu%7C9cff979c168d4a52a6b208dbd8092b8a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638341306150821485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI

Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot

2023-10-28 Thread Russell Engelman
It looks like what Liam wrote (specifically, the final figure in the post)
is pretty close to what I am looking for.

Emmanuel's suggestion of using ltt.plots.coords might also work, and it
might be easier to input that data into ggplot if I could find some way to
merge the various matrices.

Russell

On Fri, Oct 27, 2023 at 10:04 AM William Gearty 
wrote:

> You could also map your states (in this case, taxonomic groupings) onto
> your tree, then make an ltt plot with phytools::ltt(). Liam has a great
> blog post about it here:
> http://blog.phytools.org/2022/08/lineage-through-time-plots-for.html.
>
> Best,
> Will
>
> --
> *William Gearty*
> *Lerner-Gray Postdoctoral Research Fellow*
> Division of Paleontology
> American Museum of Natural History
> williamgearty.com
>
>
>
> On Thu, Oct 26, 2023 at 9:55 PM Emmanuel Paradis 
> wrote:
>
>> Hi Russell,
>>
>> There are several implementations of LTT plots among several packages, so
>> the details certainly differ depending on which one(s) you use.
>>
>> Maybe you can use the ape function ltt.plot.coords() which returns a
>> matrix with 2 columns giving the number of lineages for each node time of
>> the tree. I guess you'll need to do a bit of programming to combine the
>> different outputs from different trees.
>>
>> Best,
>>
>> Emmanuel
>>
>> - Le 27 Oct 23, à 8:05, Russell Engelman 
>> a écrit :
>>
>> > Dear R-Sig-Phylo,
>> > I was wondering if there was any way to color-code a
>> lineage-through-time plot,
>> > to highlight the proportion of taxa at specific intervals that belong
>> to a
>> > particular clade. I.e., an LTT plot of tetrapod diversity through time,
>> and I
>> > want to highlight the number of lineages at any one point in time that
>> are
>> > chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached
>> fig. as an
>> > example). I figured I can do this by asking R to return all lineages
>> that are
>> > descendants of a specific node, but am not sure what functions I can
>> use to
>> > convert the dated tree into an object that can be read into ggplot.
>>
>> > Sincerely,
>> > Russell
>>
>> > ___
>> > R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> > Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot

2023-10-26 Thread Russell Engelman
Dear R-Sig-Phylo,

I was wondering if there was any way to color-code a lineage-through-time
plot, to highlight the proportion of taxa at specific intervals that belong
to a particular clade. I.e., an LTT plot of tetrapod diversity through
time, and I want to highlight the number of lineages at any one point in
time that are chondrichthyans, sarcopterygians, lissamphibians, etc. (see
attached fig. as an example). I figured I can do this by asking R to return
all lineages that are descendants of a specific node, but am not sure what
functions I can use to convert the dated tree into an object that can be
read into ggplot.

Sincerely,
Russell
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Model Selection and PGLS

2021-07-07 Thread Russell Engelman
Dear All,

Just sending a message to see if my last email went through. The code for
the PGLS still doesn't seem to work: it's giving results contradictory to
what would be expected based on the data (e.g., mu should be extremely
negative for Monotremata, but it's very variable among the three species
and many other species exhibit higher mus in that direction, adding the
correction factor actually makes the prediction worse despite high Pagel's
lambda). I attached a reprex version of my code with the data that
reproduced the result, but I was concerned it didn't go through because
when I've tried to attach files to the r-sig-phylo list previously they are
often unable to go through. I checked the r-sig-phylo archive to see if it
successfully went through, but it isn't listed there, which is one reason I
am concerned my message did not get sent properly.

Just emailing to ask, sorry if this is a bother.

Sincerely,
Russell

On Fri, Jul 2, 2021 at 8:52 PM Simone Blomberg 
wrote:

> I knew there would be a mistake somewhere. Here is the correction:
>
> ## C <- 1+(lambda-1)*(1-mat) ## Lambda correction WRONG!!
> C <- (lambda*mat)+(1-lambda)*diag(diag(mat)) ## correct, I think..
>
> I tested it on a toy example of a tree with 5 species and it works fine..
>
> I hope there are no other mistakes! See other comments below.
>
> Cheers,
>
> Simone.
> On 3/7/21 8:40 am, Russell Engelman wrote:
>
> Dear All,
>
> Okay, so I tried the code that Dr. Blomberg provided, and it doesn't seem
> to work. I rewrote the code slightly in order for it to try and predict the
> body mass of the taxa used to calculate the line, rather than some new
> taxon, because I need to do that in order to calculate prediction error and
> be able to compare it to previous studies. I got a lambda of about 0.9,
> which is approximately the same as what I got previously. So on the surface
> it seems like the method should work.
>
> The problem comes when I try to apply this to the data itself to calculate
> mu. This is the modified code I used if that helps, though I don't think I
> can send the data or tree to make it replicable because the message
> wouldn't go through last time.
>
> tree.trimmed <- drop.tip(trees[[1]], which(is.na(b$bm)))
> b.trimmed <- b[complete.cases(b$bm),]
> fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed,
> form=~species),
>data=b.trimmed)
> (lambda <- as.numeric(fit$modelStruct))
> model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b)
> preds <- predict(fit,newdata=b)
> mat <- vcv.phylo(trees[[1]])
> C <- 1+(lambda-1)*(1-mat) ## Lambda correction
> XnoMean <- scale(model.matFULL, scale=FALSE)
> mu <- C %*% solve(C) %*%  XnoMean ## Replaced CCompletedata and cihmat
> with C, since both seem to be subsets of that data. This is the only place
> I could think things might go wrong.
>
> C is the full variance-covariance matrix, made from the tree that includes
> both the known and unknown species. CCompletedata is the
> variance-covariance matrix WITHOUT the unknown species. This corresponds to
> C in Ted and Tony's paper.  cihmat is the matrix of relationships of the
> unknown species to the known species, one row for each unknown species. So
> you cannot replace cihmat with C.
>
> mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu based
> on the data sheet. Mu is reported in phylogenetic order, whereas
> predict.gls will always report the data in alphabetical order of the row
> names even if a phylogeny is present.
>
> predict.gls reports the predicted values in the order that is present in
> the newdata argument. You shouldn't have to re-order anything in my example
> code. YMMV however if you adapt my code.
>
> exp(preds+mu2) ## "correct" prediction accounting for bias due to
> correlations with other species. Exp() is because data were log-transformed.
>
> So when I check the results of this equation the predicted values are
> actually *less *accurate than the fitted values reported from gls without
> correction. Normally the accuracy is about +/- 64% of the actual value, but
> adding the correction here results in the data being +/- 84% of the actual
> value. What's more, I took a look at the values of mu and they don't seem
> to make sense when compared on the phylogeny. Namely, if mu were being
> calculated right, one would expect monotremes to have a very negative value
> compared to all other mammals given the deep divergence in trait states
> between monotremes and all other taxa in the figure I previously attached.
> But when I check the mu values, I get a value of 0.35 for *Zaglossus*,
> -0.58 for *Tachyglossus*, and 0.59 for *Ornithorhynchus*. Which is not
> what would be expected if m

Re: [R-sig-phylo] Model Selection and PGLS

2021-07-03 Thread Russell Engelman
dicted.orig-data.check$actual)/data.check$predicted.orig)
#%PE with uncorrected PGLS
mean(abs(data.check$predicted.corr-data.check$actual)/data.check$predicted.corr)
#%PE with corrected PGLS

Sincerely,
Russell


On Fri, Jul 2, 2021 at 8:52 PM Simone Blomberg 
wrote:

> I knew there would be a mistake somewhere. Here is the correction:
>
> ## C <- 1+(lambda-1)*(1-mat) ## Lambda correction WRONG!!
> C <- (lambda*mat)+(1-lambda)*diag(diag(mat)) ## correct, I think..
>
> I tested it on a toy example of a tree with 5 species and it works fine..
>
> I hope there are no other mistakes! See other comments below.
>
> Cheers,
>
> Simone.
> On 3/7/21 8:40 am, Russell Engelman wrote:
>
> Dear All,
>
> Okay, so I tried the code that Dr. Blomberg provided, and it doesn't seem
> to work. I rewrote the code slightly in order for it to try and predict the
> body mass of the taxa used to calculate the line, rather than some new
> taxon, because I need to do that in order to calculate prediction error and
> be able to compare it to previous studies. I got a lambda of about 0.9,
> which is approximately the same as what I got previously. So on the surface
> it seems like the method should work.
>
> The problem comes when I try to apply this to the data itself to calculate
> mu. This is the modified code I used if that helps, though I don't think I
> can send the data or tree to make it replicable because the message
> wouldn't go through last time.
>
> tree.trimmed <- drop.tip(trees[[1]], which(is.na(b$bm)))
> b.trimmed <- b[complete.cases(b$bm),]
> fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed,
> form=~species),
>data=b.trimmed)
> (lambda <- as.numeric(fit$modelStruct))
> model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b)
> preds <- predict(fit,newdata=b)
> mat <- vcv.phylo(trees[[1]])
> C <- 1+(lambda-1)*(1-mat) ## Lambda correction
> XnoMean <- scale(model.matFULL, scale=FALSE)
> mu <- C %*% solve(C) %*%  XnoMean ## Replaced CCompletedata and cihmat
> with C, since both seem to be subsets of that data. This is the only place
> I could think things might go wrong.
>
> C is the full variance-covariance matrix, made from the tree that includes
> both the known and unknown species. CCompletedata is the
> variance-covariance matrix WITHOUT the unknown species. This corresponds to
> C in Ted and Tony's paper.  cihmat is the matrix of relationships of the
> unknown species to the known species, one row for each unknown species. So
> you cannot replace cihmat with C.
>
> mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu based
> on the data sheet. Mu is reported in phylogenetic order, whereas
> predict.gls will always report the data in alphabetical order of the row
> names even if a phylogeny is present.
>
> predict.gls reports the predicted values in the order that is present in
> the newdata argument. You shouldn't have to re-order anything in my example
> code. YMMV however if you adapt my code.
>
> exp(preds+mu2) ## "correct" prediction accounting for bias due to
> correlations with other species. Exp() is because data were log-transformed.
>
> So when I check the results of this equation the predicted values are
> actually *less *accurate than the fitted values reported from gls without
> correction. Normally the accuracy is about +/- 64% of the actual value, but
> adding the correction here results in the data being +/- 84% of the actual
> value. What's more, I took a look at the values of mu and they don't seem
> to make sense when compared on the phylogeny. Namely, if mu were being
> calculated right, one would expect monotremes to have a very negative value
> compared to all other mammals given the deep divergence in trait states
> between monotremes and all other taxa in the figure I previously attached.
> But when I check the mu values, I get a value of 0.35 for *Zaglossus*,
> -0.58 for *Tachyglossus*, and 0.59 for *Ornithorhynchus*. Which is not
> what would be expected if mu were modelling a phylogenetic correction
> factor (if nothing else, the range of mu should probably not span zero). So
> it suggests to me I'm not doing something right. I know that PGLS doesn't
> always result in better fits than OLS, but I'm surprised adding signal back
> in made the results worse when the fit was so poor when it was fitted
> through the ancestral node.
>
> Sincerely,
> Russell
>
> --
> Simone Blomberg, BSc (Hons), PhD, MAppStat (she/her)
> Senior Lecturer and Consultant Statistician
> School of Biological Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> T: +61 7 3365 2506
> email: S.Blomberg1_at_uq.edu.au
> Twitter:

Re: [R-sig-phylo] Model Selection and PGLS

2021-07-02 Thread Russell Engelman
Dear All,

Okay, so I tried the code that Dr. Blomberg provided, and it doesn't seem
to work. I rewrote the code slightly in order for it to try and predict the
body mass of the taxa used to calculate the line, rather than some new
taxon, because I need to do that in order to calculate prediction error and
be able to compare it to previous studies. I got a lambda of about 0.9,
which is approximately the same as what I got previously. So on the surface
it seems like the method should work.

The problem comes when I try to apply this to the data itself to calculate
mu. This is the modified code I used if that helps, though I don't think I
can send the data or tree to make it replicable because the message
wouldn't go through last time.

tree.trimmed <- drop.tip(trees[[1]], which(is.na(b$bm)))
b.trimmed <- b[complete.cases(b$bm),]
fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed,
form=~species),
   data=b.trimmed)
(lambda <- as.numeric(fit$modelStruct))
model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b)
preds <- predict(fit,newdata=b)
mat <- vcv.phylo(trees[[1]])
C <- 1+(lambda-1)*(1-mat) ## Lambda correction
XnoMean <- scale(model.matFULL, scale=FALSE)
mu <- C %*% solve(C) %*%  XnoMean ## Replaced CCompletedata and cihmat with
C, since both seem to be subsets of that data. This is the only place I
could think things might go wrong.
mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu based
on the data sheet. Mu is reported in phylogenetic order, whereas
predict.gls will always report the data in alphabetical order of the row
names even if a phylogeny is present.
exp(preds+mu2) ## "correct" prediction accounting for bias due to
correlations with other species. Exp() is because data were log-transformed.

So when I check the results of this equation the predicted values are
actually *less *accurate than the fitted values reported from gls without
correction. Normally the accuracy is about +/- 64% of the actual value, but
adding the correction here results in the data being +/- 84% of the actual
value. What's more, I took a look at the values of mu and they don't seem
to make sense when compared on the phylogeny. Namely, if mu were being
calculated right, one would expect monotremes to have a very negative value
compared to all other mammals given the deep divergence in trait states
between monotremes and all other taxa in the figure I previously attached.
But when I check the mu values, I get a value of 0.35 for *Zaglossus*,
-0.58 for *Tachyglossus*, and 0.59 for *Ornithorhynchus*. Which is not what
would be expected if mu were modelling a phylogenetic correction factor (if
nothing else, the range of mu should probably not span zero). So it
suggests to me I'm not doing something right. I know that PGLS doesn't
always result in better fits than OLS, but I'm surprised adding signal back
in made the results worse when the fit was so poor when it was fitted
through the ancestral node.

Sincerely,
Russell

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Model Selection and PGLS

2021-07-01 Thread Russell Engelman
Dear All,

Okay, I've read all of the papers that people have linked, and I have an
understanding of the theory. However, where I am struggling now is trying
to figure out how to actually perform a PGLS that incorporates signal when
predicting new data in R. I tested a bunch of gls functions in different R
packages, including *pgls* in *caper, phylolm* in *phylolm*, and *gls* in
*nlme*. I tried mvgls in *mvMORPH *but I got the error message saying to
try *gls *in *nlme* because *mvMORPH *requires multivariate data. In all of
these cases it doesn't appear as though the functions incorporate
phylogenetic information back into the PGLS model to make predictions. For
body mass estimation equations model accuracy is usually evaluated by
calculating the percent error between the fitted values for the data and
the actual data values, since prediction is the goal and log-transformation
can make other parameters like r2 more difficult to use (though I know r2
isn't usually a thing when it comes to PGLS).

However, when I calculated percent error, I noticed that the fitted values
of the function seem to be predicted through the ancestral node, *not *by
adjusting the prediction using phylogenetic information. This appears to be
the case regardless of whether fitted values were called directly from the
function (e.g., "fit.gls$fitted.values") or by predicting values of the
original data using *predict *(i.e., "predict(fit.gls,data)"). The results
of the two methods were identical when I used *phylolm *and *gls*, but
differed slightly between using predict or fitted values in *caper *(but
not enough to conclude that one method did incorporate phylogenetic signal,
just enough that it's clear they aren't identical). Furthermore, I also
tried manually estimating point values using the line fit through the
ancestral node and the values produced were identical to the fitted values
reported in these other methods. Whatever I'm doing, it's  not
incorporating phylogenetic signal to produce estimates.

I haven't tried the Bayesian methodology in Organ et al. (2007) yet, mostly
because I'm not familiar with BayesTraits and am trying to keep all my
analyses in R if possible as I have an RMarkdown document that I am
documenting my method as supplementary information. I may have to try
BayesTraits if there aren't any other options. I'm also not sure if PIC
would be easier than trying to mess with Bayesian models, though I know PIC
and PGLS produce more or less identical results.

I am surprised that researchers don’t know about these methods: they are
> basically just like ancestral state reconstruction, but for new tips in the
> tree. Everyone knows about “ancestral state reconstruction”. But if few
> people recognize that the same methodology can be applied to new taxa, then
> it’s a problem indeed.


I had numerous debates with colleagues and other researchers in my
department about this exact topic, and it's nice to know the arguments I
made of "if you already have a variance- covariance matrix with known taxa,
why can't you just adjust estimated based on phylogenetic data if you have
a resolved phylogeny" weren't completely nuts. I think a lot of it is there
don't seem to be a lot of studies where point estimates are the intended
goal (more looking at broader regression models), and at least in body mass
the only study I've seen that does include phylogenetic signal back into
the data uses PVR.

Sincerely,
Russell

On Thu, Jul 1, 2021 at 10:12 AM Cecile Ane  wrote:

>
>
> On Jun 30, 2021, at 11:20 PM, neovenatori...@gmail.com wrote:
>
> This is something that seems really, really concerning because if there is
> a method of using phylogenetic covariance to adjust the position of new
> data points it seems like a lot of workers don’t know these methods exist,
> to the point that even published papers overlook it.
>
>
> I am surprised that researchers don’t know about these methods: they are
> basically just like ancestral state reconstruction, but for new tips in the
> tree. Everyone knows about “ancestral state reconstruction”. But if few
> people recognize that the same methodology can be applied to new taxa, then
> it’s a problem indeed.
>
> Ancestral state “reconstruction” could be better called ancestral state
> “estimation” or “prediction”. In a likelihood framework, we can also give
> prediction intervals, for internal (ancestral) nodes in the tree as well
> as new tips. Not all software handle new taxa, because it requires
> extending the tree, or accepting a tree in which not all taxa have data.
> Taxa with missing data can then be predicted, like any other node.
>
> Cecile.
>
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Model Selection and PGLS

2021-06-30 Thread Russell Engelman
Dear All,

What you see is the large uncertainty in “ancestral” states, which is part
> of the intercept here. The linear relationship that you overlaid on top of
> your data is the relationship predicted at the root of the tree (as if such
> a thing existed!). There is a lot of uncertainty about the intercept, but
> much less uncertainty in the slope. It looks like the slope is not affected
> by the inclusion or exclusion of monotremes. (for one possible reference on
> the greater precision in the slope versus the intercept, there’s this:
> http://dx.doi.org/10.1214/13-AOS1105 for the BM).


Yes, that sounds right from the other data I have. The line approximates
what would be expected for the root of Mammalia, and the signal in the PGLS
is more due to shifts in the y-intercept than shifts in slope, which in
turn is supported by the anatomy of the proxy.

My second cent is that the phylogenetic predictions should be stable. The
> uncertainty in the intercept —and the large effect of including monotremes
> on the intercept— should not affect predictions, so long as you know for
> which species you want to make a prediction. If you want to make prediction
> for a species in a small clade “far” from monotremes, say, then the
> prediction is probably quite stable, even if you include monotremes: this
> is because the phylogenetic prediction should use the phylogenetic
> relationships for the species to be predicted. A prediction that uses the
> linear relationship at the root and ignores the placement of the species
> would be the worst-case scenario: for a mammal species with a completely
> unknown placement within mammals.


This is what I'm a bit confused about. I was always told (and it seemingly
implies this in some of the PGLS literature I read like Rohlf 2011 and
Smaers and Rohlf 2016) that it isn't possible to include phylogenetic data
from the new data points into the prediction in order to improve
predictions. I'm a little confused as to whether it's possible or not (see
below).

There’s probably a number of software that do phylogenetic prediction. I
> know of Rphylopars and PhyloNetworks.


I will take a look into those.

I think that Cécile' and Theodore' point is important and too often
> overlooked. Using GLS models, the BLUP (Best Linear Unbiased Prediction) is
> not simply obtained from the fitted line but should incorporates
> information from the (evolutionary here) model.


 There’s a way to impute phylogenetic signal back into a PGLS model? I
am super surprised at that. I’ve talked to at least three different
colleagues who use PGLS about this issue, and all of them had told me that
there is no way to input phylogenetic signal back into the model for new
data points and I should just go with the single regression line the model
gives me (i.e., the regression line for the ancestral node).

 I tried looking around to see what previous researchers used when
using PCM on body mass (Esteban-Trivigno and Köhler 2011, Campione and
Evans 2012, Yapuncich 2017 thesis) and it looks like all of them just went
with the best fit line with the ancestral node, i.e., looking at their
reported results they give a simple trait~predictor equation that does not
include phylogeny when calculating new data. Campion and Evans 2012 used
PIC versus PGLS, which I know are technically equivalent but it doesn't
seem like they included phylogenetic information when they predicted new
data: they used their equations on dinosaurs but there are no dinosaurs in
the tree they used. I know that it’s possible to incorporate phylogenetic
signal into the new data using PVR but PVR has been criticized for other
reasons.

This is something that seems really, really concerning because if there
is a method of using phylogenetic covariance to adjust the position of new
data points it seems like a lot of workers don’t know these methods exist,
to the point that even published papers overlook it. This was something I
was hoping to highlight in a later paper on the data, but it sounds like
people might have discussed it already. I remember talking with my
colleagues a lot about "isn't there some way to incorporate phylogenetic
information back into the model to improve accuracy of the prediction if we
know where the taxon is positioned?" and they just thought there wasn't a
way.

Regarding the model comparison, I would simply avoid it (or limit it) by
> fitting models flexible enough to accommodate between your BM and OLS case
> and summarize the results obtained across all the trees…


I am not entirely sure what is meant here. Do you mean fitting both an OLS
and BM model and comparing both models? I am reporting both, but my concern
is about which model I report is the best one to use going forward, since
the BM model is seemingly less accurate (though I am just taking the fitted
values from the PGLS model, which I don't think include phylogenetic
information). The two models I use produce dramatically different 

Re: [R-sig-phylo] Model Selection and PGLS

2021-06-28 Thread Russell Engelman
Dear Dr. Upham (and All),

Please don't take my initial message the wrong way, this is not meant to be
a dig at your 2019 study. I don’t think this is due to the birth-death tree
specifically but would be present in any study where there are multiple
phylogenetic trees to choose from or some measure of uncertainty in the tip
dates. I definitely agree with you that there is almost certainly going to
be variation in model support values if there is any difference in the
underlying phylogeny, however, I was surprised that AIC would vary this
much in a dataset where the trait data, number of tips, and branching
topology used to compute the model are more or less constant between trees.

My question is more along the lines of "given that it is logical to expect
AIC to vary based on differences between trees, how would one go about
determining which regression model is the "optimal" one to use for further
analysis"? You mentioned taking the 95% confidence intervals of the models
and seeing if they don't overlap, would this be just taking the singular
AIC from the OLS model and comparing it to the PGLS one, since OLS
seemingly doesn't produce a confidence interval of AIC values? And if the
confidence intervals do overlap, is the OLS  or PGLS considered the null
hypothesis? In my case the AIC for OLS is within the 95% confidence
intervals for the PGLS, but is much lower than the mean value (it's close
to the lower first standard deviation of the AIC values).

Sincerely,
Russell

On Mon, Jun 28, 2021 at 2:46 PM Nathan Upham  wrote:

> Hi Russell and all:
>
> I’ll respond here since the answer is related to the intended purpose of
> the VertLife mammal trees — i.e, capturing full uncertainty in node ages
> and phylogenetic relationships was one of the motivators for building the
> mammal trees in the way we did.  This approach contrasts to wanting to
> obtain the single “best tree”, since methods of phylogenetic reconstruction
> will always just be approximations of the “true tree” anyway rather than
> ever being equal to that tree.  To only use a single consensus tree in
> comparative phylogenetic analyses assumes that we know the true tree, which
> again, we don’t ever in an empirical context (only for simulations).  Those
> points were summarized well by Huelsenbeck et al. (2000:
> http://science.sciencemag.org/content/288/5475/2349), but nevertheless
> are still not standard practice in PCMs.
>
> To the point of AIC varying across the 100 trees, this is to be expected.
> Any 1 tree of 100 trees from the credible set is not very meaningful; the
> entire 100 trees need to be analyzed and then the estimate +/- SE from each
> tree can be summarized as a distribution of values.  If the 95% CI on the
> distribution of values excludes your hypothesis, then you’ve learned
> something; if not, you accept the null hypothesis.  See the animated gifs
> here (http://vertlife.org/data/mammals/) for a better conception of why
> this phylogenetic uncertainty is important to consider when doing model
> fitting or other PCMs.
>
> That said, if a single ‘best tree’ is the target, then the DNA-only MCC
> tree of 4098 species is a reasonable thing to analyze, more analogous to
> how mainstream phylogenetics has presented trees for re-use (
> https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre).
> But again, while the MCC tree is appropriate, 1 of 100 trees from the
> credible set is not.
>
> Hope that helps.  All the best,
> —nate
>
>
>
>
> ==
> Nathan S. Upham, Ph.D. (he/him)
> Assistant Research Professor & Associate Curator of Mammals
> Arizona State University, School of Life Sciences, Biodiversity Knowledge
> Integration Center (BioKIC <https://biokic.asu.edu/>)
>  ~> Check out the new Mammal Tree of Life
> <http://vertlife.org/data/mammals/> and the Mammal Diversity Database
> <https://mammaldiversity.org/>
>
> Research Associate, Yale University (Ecology and Evolutionary Biology)
> Research Associate, Field Museum of Natural History (Negaunee Integrative
> Research Center)
> Chair, Biodiversity Committee, American Society of Mammalogists
> Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group
>
> personal web: n8u.org | Google Scholar
> <https://scholar.google.com/citations?hl=en=zIn4NoUJ_op=list_works=AJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColw>
>  | ASU profile <https://isearch.asu.edu/profile/3682356>
> e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham
> <https://twitter.com/n8_upham>
>
> =

[R-sig-phylo] Model Selection and PGLS

2021-06-28 Thread Russell Engelman
Dear R-Sig-Phylo Mailing List,

I ran into a rather unusual problem. I was doing an analysis using the
mammal trees from Upham et al. (2019) downloaded off of the VertLife site.
The model statistics for my data initially suggested that the OLS model was
better supported than a PGLS model based on Akaike Information Criterion
(AIC). The reviewers for the paper wanted me to add more taxa, so I
re-downloaded a set of trees from VertLife and reran the analysis, but when
I did I found that suddenly the AIC values for the PGLS equation were
dramatically different, to the point that it favored a Brownian PGLS model
over all other models. This was despite the fact that previously I found
that an OLS model and an OU model had a better model fit than a Brownian
model, and the other accuracy statistics of interest (like percent error,
this being a model intended for use in predicting new data) also found OLS
and OU models to fit better than a Brownian PGLS model. The regression line
for a Brownian model doesn't even fit the data at all due to being biased
by a basal clade. The model also has a high amount of phylogenetic inertia
which again would seemingly make an OU model a better option.

I used drop.tip to remove the additional taxa to see if I could replicate
my previous results, but it turns out I still couldn't replicate the
results. That's when I realized what was causing the change in AIC values
wasn't the taxon selection, but the tree I was using. If I used the old
VertLife tree I could replicate the results, but the new VertLife tree
produced radically different results despite using the same tips. So what I
decided to do is rerun the analysis for all 100 trees I had available, and
it turned out there was a massive amount of variation in AIC depending on
what tree was chosen. I tried including an html data printout to show the
precise results and how I got them, but I couldn't attach them because the
mailer daemon kept saying they were too large. The AIC values between trees
vary by almost 200 points after excluding extreme outliers, when model
differences of 2 or more are often considered to represent statistically
detectable differences. The unusually low AIC I got when I first ran the
analysis happened to be because the first tree in the 100 trees merely
happened to produce a lower-than-average AIC than the whole sample. The
average AIC out of the 100 trees was higher than for the OLS model, which
again makes sense given the distribution of the data.

However, and this is where my problem comes in, how do I make appropriate
model selections for PGLS if there is such a massive amount of variation in
AIC? Especially given that between the trees in the sample there is enough
variation that it can cause one model to be favored over another? Just
picking one tree and going with that seems counterintuitive, because it's
not very objective and theoretically someone could pick a specific tree to
get the results they want, or accidentally pick a tree that might support
the wrong model as seen here. On top of that the tree topologies are more
or less identical: the same 404 taxa are present in all trees and the trees
have nearly identical topologies, the only real differences between trees
are branch lengths. But given this, how can I justify which AIC value I
report, which in turn means which model is best supported?

I did try looking at the phylo_lm function in the sensiphy package, but
that function doesn't seem to provide any method of performing model
selection between different regression models. It does seemingly report
AIC, but the AIC the function reported was dramatically different from the
aic I got using the gls function in ape and nlme.

Sincerely,
Russell

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-27 Thread Russell Engelman
Dear Everyone,

Sorry for the late reply. I had a crunch with several manuscripts that had
a looming due date, and am coming back to trying to fix the issue I am
having with the phylogeny I emailed about before. I've been trying
several things but I've still been unable to resolve the issue with the
error message

"Error in bind.tree(tree, tip, where = where, position = pp) :
  'position' is larger than the branch length"

despite the fact that the length of the terminal sister taxon to its
nearest node is much greater than the new branch length (i.e., there should
be plenty of length to attach a taxon). In many of the cases where I am
getting an error I am attaching a sister taxon to a terminal taxon, so it
doesn't seem like a lack of monophyly across the trees is the problem.

The only thing I could think of is trying to manually adjust the next node
up to give it enough room to attach the new terminal taxon, but I am unsure
what the easiest way to do that in R would be. I've never tried to mess
with branch lengths in a tree before.

With regards to what Dr. Miller said, I agree that birth-death trees seem
to suggest much younger divergence times than other studies, however I am
more struggling with the practical side of things. What I am trying to do
with this dataset is include enough taxa to perform a PGLS analysis, and
all I need is a phylogenetic tree to use as a backbone. The Upham et al.
2019 trees are about the only easily accessible trees I have been able to
find for Mammalia, I know there was an older one that Dr. Upham suggested
that was published in 2014 but I don't recall if that one was easily
available.

Sincerely,
Russell


On Fri, Jun 4, 2021 at 3:27 PM Eliot Miller  wrote:

> Russell,
>
> If you're trying to do this across a lot of species, you're going to bump
> into a lot of complexities like this. I got the impression you were trying
> to do this for many species. For example, there's no reason that the taxon
> you're adding is, even if we assume monophyly of the clade (genus in this
> example) in question, nested somewhere in the crown clade. It might be
> sister to the clade. So you should theoretically be considering that node
> as a possible place to add the taxon. Adding the taxon in a polytomy at the
> MRCA and then randomly resolving that should do that (I assume, though do
> those operations also consider moving stemwards?). You can also randomly
> choose a random valid node that doesn't render the clade polyphyletic,
> split on that node and use a birth-death model, or any number of other
> rules (midpoint, polytomy, uniform distribution) to decide how to divvy up
> branch lengths and keep the tree ultrametric. This is what's happening with
> addTaxa. Also, if you hit a situation where you're trying to add a taxon to
> a set of taxa that are not actually monophyletic, you'll get weird results
> with your current approach. If that's not your situation, no problems
> there, but you ought to check (or just use addTaxa which will make sure it
> doesn't make the clade any "more" polyphyletic, whatever that means). FWIW,
> I'm leery of using birth-death models to add lots of missing species
> because I think it biases the resulting tree towards increasing
> diversification in the present. At least it does with addTaxa. This might
> be a programming error on my part, but I actually suspect it's a systematic
> bias introduced by problems described in recent papers by Stilianos Louca
> and Matthew Pennell, e.g.
> https://www.sciencedirect.com/science/article/abs/pii/S0960982221006138?dgcid=coauthor
> Figuring out if incorporating new birth-death estimators (that don't cap
> extinction at 0) could resolve some of that bias was on my to-do list for
> addTaxa, and is why I've never gotten around to more formally publishing
> tests of it. Also, I'm fairly sure Jonathan Chang et al. wrote a relevant R
> package here which also automates most of this, and used it for a big fish
> phylogeny. That'd be worth a look if you're trying to do this across
> hundreds or thousands of species.
>
> Eliot
>
> On Fri, Jun 4, 2021 at 2:59 PM Russell Engelman 
> wrote:
>
>> Dear Everyone,
>>
>> So this is really strange. I tried what Dr. Revell suggested, and for the
>> two examples I gave it worked for *Cercopithecus albogularis*, but when
>> I applied it to *Cricetomys ansorgei *I got the message...
>>
>> "Error in bind.tree(tree, tip, where = where, position = pp) :
>>   'position' is larger than the branch length"
>>
>> I know that usually this means the new node position is usually further
>> back in time than the point where the branch diverges from its nearest
>> neighbor. However when I went to check how long the terminal branch leading
>> to *C. ansorgei*'s 

[R-sig-phylo] Degrees of Freedom in PGLS

2021-06-11 Thread Russell Engelman
Dear R-Sig-Phylo Group,

I had a question regarding the calculation of regression statistics using
PGLS. When I obtain the degrees of freedom from a PGLS object using ape in
R, I notice that it returns the same number of degrees of freedom as under
OLS, and the AIC of a bivariate function is calculated under the assumption
that k=2 (i.e., that the same number of degrees of freedom are used as in
the OLS). Does anyone know why this is? I would have thought that a PGLS
would have eaten up a greater number of degrees of freedom than OLS, if for
no other reason than it constrains the motion of the system by adding
phylogenetic covariation in as an additional variable, and thus would be
expected to eat up at least one additional degree of freedom in the same
way that adding additional categorical or quantitative variables would?

Sincerely,
Russell

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Russell Engelman
mb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0
> >> Hope that helps,
> >> —nate
> >> 
> >> Nathan S. Upham, Ph.D. (he/him)
> >> Assistant Research Professor & Associate Curator of Mammals
> >> Arizona State University, School of Life Sciences
> >>  ~> Check out the new Mammal Tree of Life <
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvertlife.org%2Fdata%2Fmammals%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbfdSFyjoCtDqUWuVoRs%2BC8Toe3v2BuFHgIpBTim9M0%3Dreserved=0>
> and the Mammal Diversity Database <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmammaldiversity.org%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sYqcCTEhWozvCAIC1vdxbBf5qeJq%2FBsV54zDRuOKABE%3Dreserved=0
> >
> >> Research Associate, Yale University (Ecology and Evolutionary Biology)
> >> Research Associate, Field Museum of Natural History (Negaunee
> Integrative Research Center)
> >> Chair, Biodiversity Committee, American Society of Mammalogists
> >> Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group
> >> personal web: n8u.org | Google Scholar <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fhl%3Den%26user%3DzIn4NoUJ%26view_op%3Dlist_works%26gmla%3DAJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColwdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sa9VBUNzHzzPUecVRmknFoa36gEQA7hySS7uFuxXh9o%3Dreserved=0>
> | ASU profile <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fisearch.asu.edu%2Fprofile%2F3682356data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=D9p3TXN7AdorpSqBqnydAvKNoPXOGZ0wR7ESw8bB4H0%3Dreserved=0
> >
> >> e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fn8_uphamdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=OM3Q8KIzJ%2BpAvHy6Z5B74%2FD%2Bsy7LkDGVCSAotBNwjnU%3Dreserved=0
> >
> >> 
> >>> On Jun 2, 2021, at 4:19 PM, Eliot Miller 
> wrote:
> >>>
> >>> Hi Russell,
> >>>
> >>> A package I wrote a while back should be able to do that fairly easily.
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Feliotmiller%2FaddTaxa__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=kgDYJXKFXRFIM8mv1x8fViY6QDiEjTQ8qP5mQ0SUq3k%3Dreserved=0
> The only paper it's described in
> >>> remains
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fbsapubs.onlinelibrary.wiley.com%2Fdoi%2Ffull%2F10.3732%2Fajb.1500195__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQp7PnRRHg%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DmHEuwIvxP0uiRhdBPZ%2BuUWUoVpIQ5PW8Xfrjyv9Nyg%3Dreserved=0
> >>> It's a wrapper for bind.tip

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Russell Engelman
mb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0
> >> Hope that helps,
> >> —nate
> >> 
> >> Nathan S. Upham, Ph.D. (he/him)
> >> Assistant Research Professor & Associate Curator of Mammals
> >> Arizona State University, School of Life Sciences
> >>  ~> Check out the new Mammal Tree of Life <
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvertlife.org%2Fdata%2Fmammals%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbfdSFyjoCtDqUWuVoRs%2BC8Toe3v2BuFHgIpBTim9M0%3Dreserved=0>
> and the Mammal Diversity Database <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmammaldiversity.org%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sYqcCTEhWozvCAIC1vdxbBf5qeJq%2FBsV54zDRuOKABE%3Dreserved=0
> >
> >> Research Associate, Yale University (Ecology and Evolutionary Biology)
> >> Research Associate, Field Museum of Natural History (Negaunee
> Integrative Research Center)
> >> Chair, Biodiversity Committee, American Society of Mammalogists
> >> Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group
> >> personal web: n8u.org | Google Scholar <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fhl%3Den%26user%3DzIn4NoUJ%26view_op%3Dlist_works%26gmla%3DAJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColwdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sa9VBUNzHzzPUecVRmknFoa36gEQA7hySS7uFuxXh9o%3Dreserved=0>
> | ASU profile <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fisearch.asu.edu%2Fprofile%2F3682356data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=D9p3TXN7AdorpSqBqnydAvKNoPXOGZ0wR7ESw8bB4H0%3Dreserved=0
> >
> >> e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham <
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fn8_uphamdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=OM3Q8KIzJ%2BpAvHy6Z5B74%2FD%2Bsy7LkDGVCSAotBNwjnU%3Dreserved=0
> >
> >> 
> >>> On Jun 2, 2021, at 4:19 PM, Eliot Miller 
> wrote:
> >>>
> >>> Hi Russell,
> >>>
> >>> A package I wrote a while back should be able to do that fairly easily.
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Feliotmiller%2FaddTaxa__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=kgDYJXKFXRFIM8mv1x8fViY6QDiEjTQ8qP5mQ0SUq3k%3Dreserved=0
> The only paper it's described in
> >>> remains
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fbsapubs.onlinelibrary.wiley.com%2Fdoi%2Ffull%2F10.3732%2Fajb.1500195__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQp7PnRRHg%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DmHEuwIvxP0uiRhdBPZ%2BuUWUoVpIQ5PW8Xfrjyv9Nyg%3Dreserved=0
> >>> It's a wrapper for bind.tip

[R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-02 Thread Russell Engelman
Dear R-sig-phylo,

I have been working with a mammalian phylogeny I recently downloaded from
VertLife (http://vertlife.org/phylosubsets/). Unfortunately, the phylogeny
is missing a large number of species, so I am trying to manually add these
taxa to the phylogeny. I have a series of 100 trees that I am using to do
things such as test for phylogenetic signal. I know how to use bind.tip to
add new taxa to a single tree, but I am having more trouble with a
multiPhylo object. I am primarily adding these taxa by placing them as
sister to their nearest included relative (since most of them are elevated
former subspecies), but the issue here is that in the 100 trees in the
multiPhylo object the node representing the taxon to bind these taxa to is
not the same across all trees due to shifting topologies.

This is an example of the code I have been using, in which "tree" is the
tree object. This works for a single 'phylo' tree but not 'multiphylo'.

```
newtree<-lapply(tree,bind.tip,tip.label="Cercopithecus_albogularis",
position=0.59,edge.length = 0.59,

where=mrca(tree)["Cercopithecus_mitis","Cercopithecus_mitis"])
```

Now, this code will not work, but I know exactly why: 'tree' is a
multiPhylo object and so the 'where' argument cannot find the node for the
terminal taxon. However, the issue is how can I tell R to repeat this
'where' argument for each of the 100 trees, since the node in question is
not identical across these trees? Is there an easier way to do this than
using the 'mrca' call for each terminal taxon? I've noticed adding a 'mrca'
argument also increases computation time and if I am reinventing the wheel
it would be nice to know if I am overthinking things.

Sincerely,
Russell

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Changing the color of text and nodes in arc.cladelabels in phytools

2021-04-08 Thread Russell Engelman
Dear Dr. Revell,

I tried par(fg="green") and it kind of worked. It changed the curved
labels and their associated text to a color, but it only changed the
outline of the dots denoting marked nodes. Additionally, if I tried to
input a vector of colors equal to my number of groups (e.g.,
par(fg=c("red","green","blue","cyan","orange")), I get the error ""fg1" is
not a graphical parameter".

The same results are produced if I try par(col="green"). The command
par(col.lab="green") just turns the entire background of the figure green.
I also tried part(fill="green"), but got the warning message "In par(fill =
"green") : "fill" is not a graphical parameter". The command
par(bg="green') does nothing.

Sincerely,
Russell

On Mon, Apr 5, 2021 at 11:11 AM Liam J. Revell  wrote:

> Dear Russell.
>
> I'm sorry that arc.cladelabels does not have the functionality that you
> want.
>
> Have you tried setting par(fg)? E.g.:
>
> par(fg="the color you want")
> arc.cladelabels(...)
> par(fg="black")
>
> You could also try par(col) or par(col.lab).
>
> Let me know if that works, if not I can see about adding the option you
> need to the function.
>
> All the best, Liam
>
> Liam J. Revell
> University of Massachusetts Boston [Assoc. Prof.]
> Universidad Católica de la Ssma Concepción [Adj. Res.]
>
> Web & phytools:
> http://faculty.umb.edu/liam.revell/, http://www.phytools.org,
> http://blog.phytools.org
>
> Academic Director UMass Boston Chile Abroad:
> https://www.umb.edu/academics/caps/international/biology_chile
>
> U.S. COVID-19 explorer web application:
> https://covid19-explorer.org/
>
> On 4/4/2021 10:36 PM, Russell Engelman wrote:
> > EXTERNAL SENDER
> >
> > Dear R-sig-phylo,
> >
> > I am a researcher who has been trying to plot a continuous trait map
> onto a
> > phylogeny using the phytools and ape packages in R. I have been trying to
> > denote specific clades using the arc.cladelabels function in phytools,
> but
> > I have also been trying to color-code the text, curve, and node for each
> of
> > the groups to make it easier to see which branches pertain to which
> clade.
> > However, I have been having a lot of trouble adjusting the properties of
> > the text and nodes created by arc.cladetools.
> >
> > Here is an example using the *Anolis *svl data that shows what I mean:
> >
> > ```
> > library(ape)
> > library(phytools)
> > anole.tree<-read.tree("
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.org%2Feqg2015%2Fdata%2Fanole.tredata=04%7C01%7Cliam.revell%40umb.edu%7C017b93bf1ab74239149608d8f84157ed%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637532306822786314%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=NN6WMsldnLYZtuXWpndNA6ORhejdZbJdqDuNeI6PsY8%3Dreserved=0
> ")
> > svl <- read.csv("
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.org%2Feqg2015%2Fdata%2Fsvl.csvdata=04%7C01%7Cliam.revell%40umb.edu%7C017b93bf1ab74239149608d8f84157ed%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637532306822796308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=EA2P63ohsozCGtfU9hpg9k319zu168X%2BfmCZlOJ4cX4%3Dreserved=0
> ",
> >  row.names=1)
> > svl <- as.matrix(svl)[,1]
> > fit <- fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)
> > obj<-contMap(anole.tree,svl,plot=FALSE)
> > nodes<-c(mrca(anole.tree)["equestris","noblei"],
> >   mrca(anole.tree)["barahonae","barbatus"])
> > labels<-c("clade1","clade2")
> > plot(obj,type="fan",cex=5,fsize=0.5,lwd=5)
> > for(i in 1:length(nodes))
> >arc.cladelabels(text = labels[i],node = nodes[i],mark.node = TRUE,
> > ln.offset =
> >  1.39, col = c("green","red")[i], lab.offset =
> >  1.46, fsize = .5, orientation = "curved")
> > ```
> >
> > None of the calls I have tried for the arc.cladelabels function seem to
> be
> > able to adjust the color of the text labels the function creates (e.g.,
> > text.col, col.text, col.lab, etc), nor the color or size of the nodes
> when
> > mark.node = TRUE. The "col" call only affects the line drawn. Examining
> the
> > base code for arc.cladelabels finds that the color of the nodes is always
> > set to be red, and I have been unable to

[R-sig-phylo] Changing the color of text and nodes in arc.cladelabels in phytools

2021-04-05 Thread Russell Engelman
Dear R-sig-phylo,

I am a researcher who has been trying to plot a continuous trait map onto a
phylogeny using the phytools and ape packages in R. I have been trying to
denote specific clades using the arc.cladelabels function in phytools, but
I have also been trying to color-code the text, curve, and node for each of
the groups to make it easier to see which branches pertain to which clade.
However, I have been having a lot of trouble adjusting the properties of
the text and nodes created by arc.cladetools.

Here is an example using the *Anolis *svl data that shows what I mean:

```
library(ape)
library(phytools)
anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre;)
svl <- read.csv("http://www.phytools.org/eqg2015/data/svl.csv;,
row.names=1)
svl <- as.matrix(svl)[,1]
fit <- fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)
obj<-contMap(anole.tree,svl,plot=FALSE)
nodes<-c(mrca(anole.tree)["equestris","noblei"],
 mrca(anole.tree)["barahonae","barbatus"])
labels<-c("clade1","clade2")
plot(obj,type="fan",cex=5,fsize=0.5,lwd=5)
for(i in 1:length(nodes))
  arc.cladelabels(text = labels[i],node = nodes[i],mark.node = TRUE,
ln.offset =
1.39, col = c("green","red")[i], lab.offset =
1.46, fsize = .5, orientation = "curved")
```

None of the calls I have tried for the arc.cladelabels function seem to be
able to adjust the color of the text labels the function creates (e.g.,
text.col, col.text, col.lab, etc), nor the color or size of the nodes when
mark.node = TRUE. The "col" call only affects the line drawn. Examining the
base code for arc.cladelabels finds that the color of the nodes is always
set to be red, and I have been unable to figure out how to alter it. I
could always write a custom function that copies arc.cladelabels but alters
the color function, but my concern is that this is part of an RMarkdown
document and as a result any alteration I make to the arc.cladelabels
function will not show up when another person knits the data (as it will
call on their copy of phytools).

Alternatively I am wondering if it is possible to export the tree to ggtree
and add the nodes there, but I have noticed that ggtree does not allow for
arced clade labels and for some reason it also greatly extends the margins
of the plot beyond the margins of the graph when its equivalent of
arc.cladelabels (geom_cladelabel) is called.

I was wondering if anyone else here knew any potential ways I could adjust
the color and size of the text and nodes for this function.

Sincerely,
Russell

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/