Re: [R-sig-phylo] use R to download the DNA barcode sequence of a list, of species from GenBank

Mario Suárez Tue, 15 Jul 2014 17:39:28 -0700

I'm using R (and package CCA) and trying to perform regularized canonical 
correlation analysis with two variable sets (species abundances and food 
abundances stored as the two matrices Y and X, respectively) in which the 
number of units (N=15) is less than the number of variables in the matrices, 
which is >400 (most of them being potential "explanatory" variables, with only 
12-13 "response" variables). Gonzalez et al. (2008, 
http://www.jstatsoft.org/v23/i12/paper) note that the package "includes a 
regularized version of CCA to deal with data sets with more variables than 
units", which is certainly what I have with only 15 "units." Thus, I'm trying 
to perform regularized canonical correlation analysis using the CCA package in 
order to look at the relationships in my variable sets. I have been following 
the process Gonzalez et al (2008) go through in their paper. However, I get to 
an error message Error in chol.default(Bmat) : the leading minor of order 12 is 
not positive definite and I do not know what it means or what to do about it. 
Here is the code, and any ideas or knowledge on the subject would be 
appreciated.library(CCA)
correl <- matcor(X, Y)
img.matcor(correl, type = 2)
res.regul <- estim.regul(X, Y, plt = TRUE,
    grid1 = seq(0.0001, 0.2, l=51),
    grid2 = seq(0, 0.2, l=51))


Error in chol.default(Bmat) : the leading minor of order 12 is not positive 
definite
Dr. Mario E. Suárez Mota
 
 
 
Quien no me mata me hace más fuerte...



> Date: Tue, 15 Jul 2014 08:37:22 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a 
> list, of species from GenBank
> 
> Hi Yuxin,
> 
> you can also try out the phyloGenerator tool to download specific
> sequences from GenBank
> (http://willpearse.github.io/phyloGenerator/index.html). You only need a
> species list to do so, for more details read the manual.
> 
> Best,
> 
> Eugen
> 
> Am 14.07.2014 12:00, schrieb [email protected]:
> > Send R-sig-phylo mailing list submissions to
> >     [email protected]
> > 
> > To subscribe or unsubscribe via the World Wide Web, visit
> >     https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > or, via email, send a message with subject or body 'help' to
> >     [email protected]
> > 
> > You can reach the person managing the list at
> >     [email protected]
> > 
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of R-sig-phylo digest..."
> > 
> > 
> > Today's Topics:
> > 
> >    1. use R to download the DNA barcode sequence of a list  of
> >       species from GenBank ([email protected])
> >    2. Re: use R to download the DNA barcode sequence of a list of
> >       species from GenBank (Karolis Ramanauskas)
> >    3. Fw: Re: use R to download the DNA barcode sequence of a list
> >       of species from GenBank ([email protected])
> > 
> > 
> > ----------------------------------------------------------------------
> > 
> > Message: 1
> > Date: Sun, 13 Jul 2014 23:39:13 +0800
> > From: "[email protected]" <[email protected]>
> > To: r-sig-phylo <[email protected]>
> > Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a
> >     list    of species from GenBank
> > Message-ID: <[email protected]>
> > Content-Type: text/plain
> > 
> > Dear All,
> > 
> > I have a list of about 2,000 plant species, and want to construct a 
> > phylogenetic tree for them. I'd like to use the DNA barcode data availabe 
> > in GenBank. Then I will first need to download these DNA sequences from the 
> > Internet. I know that read.GenBank in package "ape" is capable to do it if 
> > I have the GenBank accession numbers. But what I only have now is their 
> > species names. Does anybody know which R function can batch-process it with 
> > only species names from GenBank?
> > 
> > Many thanks in advance.
> > Yuxin 
> > 
> > 
> > 
> > Yuxin Chen
> > Phd Candidate
> > School of Life Sciences
> > Sun Yat-sen University
> > Guangzhou, P. R. China, 510006
> > [email protected] or [email protected] 
> > 
> >     [[alternative HTML version deleted]]
> > 
> > 
> > ------------------------------
> > 
> > Message: 2
> > Date: Sun, 13 Jul 2014 12:31:03 -0500
> > From: Karolis Ramanauskas <[email protected]>
> > To: [email protected], [email protected]
> > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence
> >     of a list of species from GenBank
> > Message-ID:
> >     <CACT_pJHyfaTqX=Ft3g+g=MZ85jxf=_xf_ame4ynkwcyws+j...@mail.gmail.com>
> > Content-Type: text/plain
> > 
> > Good day,
> > 
> > I understand you have done some work already, but you may want to try my
> > PhyloMill pipeline. It will do exactly what you need. It is written in
> > Python, not R. You will need to give it the names of ingroup and outgroup
> > taxa and which loci you want to use. If the loci you want to use are not
> > predefined in PhyloMill, I can create the definitions, just let me know
> > which loci you want to use.
> > 
> > PhyloMill will actually do a lot more than just download and align
> > sequences, it will filter mislabeled sequences, reverse-complement if
> > needed, etc. It will also create a consensus sequence when multiple GI
> > accessions are available for that taxon and locus.
> > 
> > https://github.com/karolisr/krpy
> > 
> > Peace,
> > Karolis Ramanauskas
> > Department of Biological Sciences
> > University of Illinois at Chicago
> > 840 W. Taylor St. SEL 4093 M/C 067
> > Chicago, IL 60607
> > E-Mail: [email protected]
> > 
> >> From: "[email protected]" <[email protected]>
> >> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a
> > list of species from GenBank
> >> Date: July 13, 2014 10:39:13 AM CDT
> >> To: r-sig-phylo <[email protected]>
> >>
> >> Dear All,
> >>
> >> I have a list of about 2,000 plant species, and want to construct a
> > phylogenetic tree for them. I'd like to use the DNA barcode data availabe
> > in GenBank. Then I will first need to download these DNA sequences from the
> > Internet. I know that read.GenBank in package "ape" is capable to do it if
> > I have the GenBank accession numbers. But what I only have now is their
> > species names. Does anybody know which R function can batch-process it with
> > only species names from GenBank?
> >>
> >> Many thanks in advance.
> >> Yuxin
> >>
> >>
> >>
> >> Yuxin Chen
> >> Phd Candidate
> >> School of Life Sciences
> >> Sun Yat-sen University
> >> Guangzhou, P. R. China, 510006
> >> [email protected] or [email protected]
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-phylo mailing list - [email protected]
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >> Searchable archive at
> > http://www.mail-archive.com/[email protected]/
> > 
> >     [[alternative HTML version deleted]]
> > 
> > 
> > 
> > ------------------------------
> > 
> > Message: 3
> > Date: Mon, 14 Jul 2014 14:19:29 +0800
> > From: "[email protected]" <[email protected]>
> > To: kraman2 <[email protected]>
> > Cc: R-sig-phylo <[email protected]>
> > Subject: [R-sig-phylo] Fw: Re: use R to download the DNA barcode
> >     sequence of     a list of species from GenBank
> > Message-ID: <[email protected]>
> > Content-Type: text/plain
> > 
> > Hi Karolis,
> > 
> > Thank you for providing the guides. 
> > 
> > David's "rentrez" R package works quite well with my problem (I have copied 
> > his reply below), and I am not familiar with Python. But thank you all the 
> > same.
> > 
> > Cheers,
> > Yuxin
> > 
> > 
> > 
> > Yuxin Chen
> > Phd Candidate
> > School of Life Sciences
> > Sun Yat-sen University
> > Guangzhou, P. R. China, 510006
> > [email protected] or [email protected] 
> >  
> > From: [email protected]
> > Date: 2014-07-14 14:10
> > To: David Winter
> > Subject: Re: Re: [R-sig-phylo] use R to download the DNA barcode sequence 
> > of a list of species from GenBank
> > Hi David, 
> > 
> > Thanks for your reply.
> > 
> > The package "rentrez" is really wonderful. I have already tried to search 
> > my species list with your function fetch_gene and it worked. 
> > But there is one small question. What is the "BOLD[all]" for? Sorry, I am 
> > just a beginner on phylogeny and am not familar with this term yet. When I 
> > included it in my case, the search result is empty, but when I exluded this 
> > term but keeping others the same it worked.
> > 
> > Thank you again,
> > Yuxin
> > 
> > 
> > 
> > Yuxin Chen
> > Phd Candidate
> > School of Life Sciences
> > Sun Yat-sen University
> > Guangzhou, P. R. China, 510006
> > [email protected] or [email protected] 
> >  
> > From: David Winter
> > Date: 2014-07-14 02:52
> > To: [email protected]
> > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a 
> > list of species from GenBank
> > Hi Yuxin,
> >  
> > If you want specifically to get at the Barcode of Life records in
> > genbank then you can try using the NCBI's Entrez tools
> > (http://www.ncbi.nlm.nih.gov/books/NBK25500/). If you want to do it in
> > R you can use rentrez, a library that I maintain as part of rOpenSci
> > (https://github.com/ropensci/rentrez)
> >  
> > Taking a quick look at some of the BOLD records, it seems they are not
> > consistently tagged in a way that makes them easy to search for.
> > Here's what I came up with for a solution
> >  
> > library(rentrez)
> > nuc_search <- entrez_search(db="nuccore", term="Solanum[Organism]
> > rbcl[gene] BOLD[all]", retmax=40)
> > head(nuc_search$ids)
> > ##[1] "409977017" "409977015" "379134037" "379134035" "379133963" 
> > "326394567"
> >  
> > Those ids can then be passed to read.Genbank or entrez_fetch to
> > retrieve records.
> >  
> > If you want to do this for a bunch of genes you might want to wrap the
> > whole process up in a function:
> >  
> > fetch_gene <- function(organism_name, gene_name, file_format="fasta",
> > max_recs=50){
> >     sterm <- sprintf("%s[organism] %s[gene] BOLD[all]", organism_name,
> > gene_name)
> >     nuc_search <- entrez_search(db="nuccore", term=sterm, retmax=max_recs)
> >     return(entrez_fetch(db="nuccore", id=nuc_search$ids, 
> > rettype=file_format))
> > }
> > genera <- c("Solanum", "Terminalia")
> > recs <- lapply(genera, fetch_gene, gene_name="rbcl")
> >  
> > Which will give you a list of characters, each representing a fasta
> > file. You can check have the right number of records etc if you want:
> >  
> > library(stringr)
> > sapply(recs, str_count, pattern=">")
> >  
> > Almost every language you might otherwise use in bioinformatics has a
> > wrapper for the Entrez API, so you easily adapt this to You Favourite
> > Language if you wanted to.
> >  
> > Hope that's some help to you
> >  
> > David
> >  
> > On Sun, Jul 13, 2014 at 8:39 AM, [email protected]
> > <[email protected]> wrote:
> >> Dear All,
> >>
> >> I have a list of about 2,000 plant species, and want to construct a 
> >> phylogenetic tree for them. I'd like to use the DNA barcode data availabe 
> >> in GenBank. Then I will first need to download these DNA sequences from 
> >> the Internet. I know that read.GenBank in package "ape" is capable to do 
> >> it if I have the GenBank accession numbers. But what I only have now is 
> >> their species names. Does anybody know which R function can batch-process 
> >> it with only species names from GenBank?
> >>
> >> Many thanks in advance.
> >> Yuxin
> >>
> >>
> >>
> >> Yuxin Chen
> >> Phd Candidate
> >> School of Life Sciences
> >> Sun Yat-sen University
> >> Guangzhou, P. R. China, 510006
> >> [email protected] or [email protected]
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-phylo mailing list - [email protected]
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >> Searchable archive at 
> >> http://www.mail-archive.com/[email protected]/
> >  
> >  
> >  
> >
> 
> _______________________________________________
> R-sig-phylo mailing list - [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/[email protected]/
                                          
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Re: [R-sig-phylo] use R to download the DNA barcode sequence of a list, of species from GenBank

Reply via email to