Thank you,  Eugen.

Yes, I just read the MEE paper on PhyloGenerator. What I really want is this 
kind of tool, which is able to download DNA sequence, do alignment and finally 
construct the phylogenetic tree under the contrain of a phylomatic tree. That 
is really great!

Cheers,
Yuxin



Yuxin Chen
Phd Candidate
School of Life Sciences
Sun Yat-sen University
Guangzhou, P. R. China, 510006
[email protected] or [email protected] 
 
From: Eugen
Date: 2014-07-15 14:37
To: r-sig-phylo
Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a 
list, of species from GenBank
Hi Yuxin,
 
you can also try out the phyloGenerator tool to download specific
sequences from GenBank
(http://willpearse.github.io/phyloGenerator/index.html). You only need a
species list to do so, for more details read the manual.
 
Best,
 
Eugen
 
Am 14.07.2014 12:00, schrieb [email protected]:
> Send R-sig-phylo mailing list submissions to
> [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> or, via email, send a message with subject or body 'help' to
> [email protected]
> 
> You can reach the person managing the list at
> [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-phylo digest..."
> 
> 
> Today's Topics:
> 
>    1. use R to download the DNA barcode sequence of a list of
>       species from GenBank ([email protected])
>    2. Re: use R to download the DNA barcode sequence of a list of
>       species from GenBank (Karolis Ramanauskas)
>    3. Fw: Re: use R to download the DNA barcode sequence of a list
>       of species from GenBank ([email protected])
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sun, 13 Jul 2014 23:39:13 +0800
> From: "[email protected]" <[email protected]>
> To: r-sig-phylo <[email protected]>
> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a
> list of species from GenBank
> Message-ID: <[email protected]>
> Content-Type: text/plain
> 
> Dear All,
> 
> I have a list of about 2,000 plant species, and want to construct a 
> phylogenetic tree for them. I'd like to use the DNA barcode data availabe in 
> GenBank. Then I will first need to download these DNA sequences from the 
> Internet. I know that read.GenBank in package "ape" is capable to do it if I 
> have the GenBank accession numbers. But what I only have now is their species 
> names. Does anybody know which R function can batch-process it with only 
> species names from GenBank?
> 
> Many thanks in advance.
> Yuxin 
> 
> 
> 
> Yuxin Chen
> Phd Candidate
> School of Life Sciences
> Sun Yat-sen University
> Guangzhou, P. R. China, 510006
> [email protected] or [email protected] 
> 
> [[alternative HTML version deleted]]
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 13 Jul 2014 12:31:03 -0500
> From: Karolis Ramanauskas <[email protected]>
> To: [email protected], [email protected]
> Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence
> of a list of species from GenBank
> Message-ID:
> <CACT_pJHyfaTqX=Ft3g+g=MZ85jxf=_xf_ame4ynkwcyws+j...@mail.gmail.com>
> Content-Type: text/plain
> 
> Good day,
> 
> I understand you have done some work already, but you may want to try my
> PhyloMill pipeline. It will do exactly what you need. It is written in
> Python, not R. You will need to give it the names of ingroup and outgroup
> taxa and which loci you want to use. If the loci you want to use are not
> predefined in PhyloMill, I can create the definitions, just let me know
> which loci you want to use.
> 
> PhyloMill will actually do a lot more than just download and align
> sequences, it will filter mislabeled sequences, reverse-complement if
> needed, etc. It will also create a consensus sequence when multiple GI
> accessions are available for that taxon and locus.
> 
> https://github.com/karolisr/krpy
> 
> Peace,
> Karolis Ramanauskas
> Department of Biological Sciences
> University of Illinois at Chicago
> 840 W. Taylor St. SEL 4093 M/C 067
> Chicago, IL 60607
> E-Mail: [email protected]
> 
>> From: "[email protected]" <[email protected]>
>> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a
> list of species from GenBank
>> Date: July 13, 2014 10:39:13 AM CDT
>> To: r-sig-phylo <[email protected]>
>>
>> Dear All,
>>
>> I have a list of about 2,000 plant species, and want to construct a
> phylogenetic tree for them. I'd like to use the DNA barcode data availabe
> in GenBank. Then I will first need to download these DNA sequences from the
> Internet. I know that read.GenBank in package "ape" is capable to do it if
> I have the GenBank accession numbers. But what I only have now is their
> species names. Does anybody know which R function can batch-process it with
> only species names from GenBank?
>>
>> Many thanks in advance.
>> Yuxin
>>
>>
>>
>> Yuxin Chen
>> Phd Candidate
>> School of Life Sciences
>> Sun Yat-sen University
>> Guangzhou, P. R. China, 510006
>> [email protected] or [email protected]
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-phylo mailing list - [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
> http://www.mail-archive.com/[email protected]/
> 
> [[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 14 Jul 2014 14:19:29 +0800
> From: "[email protected]" <[email protected]>
> To: kraman2 <[email protected]>
> Cc: R-sig-phylo <[email protected]>
> Subject: [R-sig-phylo] Fw: Re: use R to download the DNA barcode
> sequence of a list of species from GenBank
> Message-ID: <[email protected]>
> Content-Type: text/plain
> 
> Hi Karolis,
> 
> Thank you for providing the guides. 
> 
> David's "rentrez" R package works quite well with my problem (I have copied 
> his reply below), and I am not familiar with Python. But thank you all the 
> same.
> 
> Cheers,
> Yuxin
> 
> 
> 
> Yuxin Chen
> Phd Candidate
> School of Life Sciences
> Sun Yat-sen University
> Guangzhou, P. R. China, 510006
> [email protected] or [email protected] 
>  
> From: [email protected]
> Date: 2014-07-14 14:10
> To: David Winter
> Subject: Re: Re: [R-sig-phylo] use R to download the DNA barcode sequence of 
> a list of species from GenBank
> Hi David, 
> 
> Thanks for your reply.
> 
> The package "rentrez" is really wonderful. I have already tried to search my 
> species list with your function fetch_gene and it worked. 
> But there is one small question. What is the "BOLD[all]" for? Sorry, I am 
> just a beginner on phylogeny and am not familar with this term yet. When I 
> included it in my case, the search result is empty, but when I exluded this 
> term but keeping others the same it worked.
> 
> Thank you again,
> Yuxin
> 
> 
> 
> Yuxin Chen
> Phd Candidate
> School of Life Sciences
> Sun Yat-sen University
> Guangzhou, P. R. China, 510006
> [email protected] or [email protected] 
>  
> From: David Winter
> Date: 2014-07-14 02:52
> To: [email protected]
> Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a 
> list of species from GenBank
> Hi Yuxin,
>  
> If you want specifically to get at the Barcode of Life records in
> genbank then you can try using the NCBI's Entrez tools
> (http://www.ncbi.nlm.nih.gov/books/NBK25500/). If you want to do it in
> R you can use rentrez, a library that I maintain as part of rOpenSci
> (https://github.com/ropensci/rentrez)
>  
> Taking a quick look at some of the BOLD records, it seems they are not
> consistently tagged in a way that makes them easy to search for.
> Here's what I came up with for a solution
>  
> library(rentrez)
> nuc_search <- entrez_search(db="nuccore", term="Solanum[Organism]
> rbcl[gene] BOLD[all]", retmax=40)
> head(nuc_search$ids)
> ##[1] "409977017" "409977015" "379134037" "379134035" "379133963" "326394567"
>  
> Those ids can then be passed to read.Genbank or entrez_fetch to
> retrieve records.
>  
> If you want to do this for a bunch of genes you might want to wrap the
> whole process up in a function:
>  
> fetch_gene <- function(organism_name, gene_name, file_format="fasta",
> max_recs=50){
>     sterm <- sprintf("%s[organism] %s[gene] BOLD[all]", organism_name,
> gene_name)
>     nuc_search <- entrez_search(db="nuccore", term=sterm, retmax=max_recs)
>     return(entrez_fetch(db="nuccore", id=nuc_search$ids, rettype=file_format))
> }
> genera <- c("Solanum", "Terminalia")
> recs <- lapply(genera, fetch_gene, gene_name="rbcl")
>  
> Which will give you a list of characters, each representing a fasta
> file. You can check have the right number of records etc if you want:
>  
> library(stringr)
> sapply(recs, str_count, pattern=">")
>  
> Almost every language you might otherwise use in bioinformatics has a
> wrapper for the Entrez API, so you easily adapt this to You Favourite
> Language if you wanted to.
>  
> Hope that's some help to you
>  
> David
>  
> On Sun, Jul 13, 2014 at 8:39 AM, [email protected]
> <[email protected]> wrote:
>> Dear All,
>>
>> I have a list of about 2,000 plant species, and want to construct a 
>> phylogenetic tree for them. I'd like to use the DNA barcode data availabe in 
>> GenBank. Then I will first need to download these DNA sequences from the 
>> Internet. I know that read.GenBank in package "ape" is capable to do it if I 
>> have the GenBank accession numbers. But what I only have now is their 
>> species names. Does anybody know which R function can batch-process it with 
>> only species names from GenBank?
>>
>> Many thanks in advance.
>> Yuxin
>>
>>
>>
>> Yuxin Chen
>> Phd Candidate
>> School of Life Sciences
>> Sun Yat-sen University
>> Guangzhou, P. R. China, 510006
>> [email protected] or [email protected]
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-phylo mailing list - [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/[email protected]/
>  
>  
>  
>
 
_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Reply via email to