Thank you, Eugen. Yes, I just read the MEE paper on PhyloGenerator. What I really want is this kind of tool, which is able to download DNA sequence, do alignment and finally construct the phylogenetic tree under the contrain of a phylomatic tree. That is really great!
Cheers, Yuxin Yuxin Chen Phd Candidate School of Life Sciences Sun Yat-sen University Guangzhou, P. R. China, 510006 [email protected] or [email protected] From: Eugen Date: 2014-07-15 14:37 To: r-sig-phylo Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a list, of species from GenBank Hi Yuxin, you can also try out the phyloGenerator tool to download specific sequences from GenBank (http://willpearse.github.io/phyloGenerator/index.html). You only need a species list to do so, for more details read the manual. Best, Eugen Am 14.07.2014 12:00, schrieb [email protected]: > Send R-sig-phylo mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of R-sig-phylo digest..." > > > Today's Topics: > > 1. use R to download the DNA barcode sequence of a list of > species from GenBank ([email protected]) > 2. Re: use R to download the DNA barcode sequence of a list of > species from GenBank (Karolis Ramanauskas) > 3. Fw: Re: use R to download the DNA barcode sequence of a list > of species from GenBank ([email protected]) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 13 Jul 2014 23:39:13 +0800 > From: "[email protected]" <[email protected]> > To: r-sig-phylo <[email protected]> > Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a > list of species from GenBank > Message-ID: <[email protected]> > Content-Type: text/plain > > Dear All, > > I have a list of about 2,000 plant species, and want to construct a > phylogenetic tree for them. I'd like to use the DNA barcode data availabe in > GenBank. Then I will first need to download these DNA sequences from the > Internet. I know that read.GenBank in package "ape" is capable to do it if I > have the GenBank accession numbers. But what I only have now is their species > names. Does anybody know which R function can batch-process it with only > species names from GenBank? > > Many thanks in advance. > Yuxin > > > > Yuxin Chen > Phd Candidate > School of Life Sciences > Sun Yat-sen University > Guangzhou, P. R. China, 510006 > [email protected] or [email protected] > > [[alternative HTML version deleted]] > > > ------------------------------ > > Message: 2 > Date: Sun, 13 Jul 2014 12:31:03 -0500 > From: Karolis Ramanauskas <[email protected]> > To: [email protected], [email protected] > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence > of a list of species from GenBank > Message-ID: > <CACT_pJHyfaTqX=Ft3g+g=MZ85jxf=_xf_ame4ynkwcyws+j...@mail.gmail.com> > Content-Type: text/plain > > Good day, > > I understand you have done some work already, but you may want to try my > PhyloMill pipeline. It will do exactly what you need. It is written in > Python, not R. You will need to give it the names of ingroup and outgroup > taxa and which loci you want to use. If the loci you want to use are not > predefined in PhyloMill, I can create the definitions, just let me know > which loci you want to use. > > PhyloMill will actually do a lot more than just download and align > sequences, it will filter mislabeled sequences, reverse-complement if > needed, etc. It will also create a consensus sequence when multiple GI > accessions are available for that taxon and locus. > > https://github.com/karolisr/krpy > > Peace, > Karolis Ramanauskas > Department of Biological Sciences > University of Illinois at Chicago > 840 W. Taylor St. SEL 4093 M/C 067 > Chicago, IL 60607 > E-Mail: [email protected] > >> From: "[email protected]" <[email protected]> >> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a > list of species from GenBank >> Date: July 13, 2014 10:39:13 AM CDT >> To: r-sig-phylo <[email protected]> >> >> Dear All, >> >> I have a list of about 2,000 plant species, and want to construct a > phylogenetic tree for them. I'd like to use the DNA barcode data availabe > in GenBank. Then I will first need to download these DNA sequences from the > Internet. I know that read.GenBank in package "ape" is capable to do it if > I have the GenBank accession numbers. But what I only have now is their > species names. Does anybody know which R function can batch-process it with > only species names from GenBank? >> >> Many thanks in advance. >> Yuxin >> >> >> >> Yuxin Chen >> Phd Candidate >> School of Life Sciences >> Sun Yat-sen University >> Guangzhou, P. R. China, 510006 >> [email protected] or [email protected] >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-phylo mailing list - [email protected] >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at > http://www.mail-archive.com/[email protected]/ > > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 3 > Date: Mon, 14 Jul 2014 14:19:29 +0800 > From: "[email protected]" <[email protected]> > To: kraman2 <[email protected]> > Cc: R-sig-phylo <[email protected]> > Subject: [R-sig-phylo] Fw: Re: use R to download the DNA barcode > sequence of a list of species from GenBank > Message-ID: <[email protected]> > Content-Type: text/plain > > Hi Karolis, > > Thank you for providing the guides. > > David's "rentrez" R package works quite well with my problem (I have copied > his reply below), and I am not familiar with Python. But thank you all the > same. > > Cheers, > Yuxin > > > > Yuxin Chen > Phd Candidate > School of Life Sciences > Sun Yat-sen University > Guangzhou, P. R. China, 510006 > [email protected] or [email protected] > > From: [email protected] > Date: 2014-07-14 14:10 > To: David Winter > Subject: Re: Re: [R-sig-phylo] use R to download the DNA barcode sequence of > a list of species from GenBank > Hi David, > > Thanks for your reply. > > The package "rentrez" is really wonderful. I have already tried to search my > species list with your function fetch_gene and it worked. > But there is one small question. What is the "BOLD[all]" for? Sorry, I am > just a beginner on phylogeny and am not familar with this term yet. When I > included it in my case, the search result is empty, but when I exluded this > term but keeping others the same it worked. > > Thank you again, > Yuxin > > > > Yuxin Chen > Phd Candidate > School of Life Sciences > Sun Yat-sen University > Guangzhou, P. R. China, 510006 > [email protected] or [email protected] > > From: David Winter > Date: 2014-07-14 02:52 > To: [email protected] > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a > list of species from GenBank > Hi Yuxin, > > If you want specifically to get at the Barcode of Life records in > genbank then you can try using the NCBI's Entrez tools > (http://www.ncbi.nlm.nih.gov/books/NBK25500/). If you want to do it in > R you can use rentrez, a library that I maintain as part of rOpenSci > (https://github.com/ropensci/rentrez) > > Taking a quick look at some of the BOLD records, it seems they are not > consistently tagged in a way that makes them easy to search for. > Here's what I came up with for a solution > > library(rentrez) > nuc_search <- entrez_search(db="nuccore", term="Solanum[Organism] > rbcl[gene] BOLD[all]", retmax=40) > head(nuc_search$ids) > ##[1] "409977017" "409977015" "379134037" "379134035" "379133963" "326394567" > > Those ids can then be passed to read.Genbank or entrez_fetch to > retrieve records. > > If you want to do this for a bunch of genes you might want to wrap the > whole process up in a function: > > fetch_gene <- function(organism_name, gene_name, file_format="fasta", > max_recs=50){ > sterm <- sprintf("%s[organism] %s[gene] BOLD[all]", organism_name, > gene_name) > nuc_search <- entrez_search(db="nuccore", term=sterm, retmax=max_recs) > return(entrez_fetch(db="nuccore", id=nuc_search$ids, rettype=file_format)) > } > genera <- c("Solanum", "Terminalia") > recs <- lapply(genera, fetch_gene, gene_name="rbcl") > > Which will give you a list of characters, each representing a fasta > file. You can check have the right number of records etc if you want: > > library(stringr) > sapply(recs, str_count, pattern=">") > > Almost every language you might otherwise use in bioinformatics has a > wrapper for the Entrez API, so you easily adapt this to You Favourite > Language if you wanted to. > > Hope that's some help to you > > David > > On Sun, Jul 13, 2014 at 8:39 AM, [email protected] > <[email protected]> wrote: >> Dear All, >> >> I have a list of about 2,000 plant species, and want to construct a >> phylogenetic tree for them. I'd like to use the DNA barcode data availabe in >> GenBank. Then I will first need to download these DNA sequences from the >> Internet. I know that read.GenBank in package "ape" is capable to do it if I >> have the GenBank accession numbers. But what I only have now is their >> species names. Does anybody know which R function can batch-process it with >> only species names from GenBank? >> >> Many thanks in advance. >> Yuxin >> >> >> >> Yuxin Chen >> Phd Candidate >> School of Life Sciences >> Sun Yat-sen University >> Guangzhou, P. R. China, 510006 >> [email protected] or [email protected] >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-phylo mailing list - [email protected] >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at http://www.mail-archive.com/[email protected]/ > > > > _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/ [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/
