I'm using R (and package CCA) and trying to perform regularized canonical correlation analysis with two variable sets (species abundances and food abundances stored as the two matrices Y and X, respectively) in which the number of units (N=15) is less than the number of variables in the matrices, which is >400 (most of them being potential "explanatory" variables, with only 12-13 "response" variables). Gonzalez et al. (2008, http://www.jstatsoft.org/v23/i12/paper) note that the package "includes a regularized version of CCA to deal with data sets with more variables than units", which is certainly what I have with only 15 "units." Thus, I'm trying to perform regularized canonical correlation analysis using the CCA package in order to look at the relationships in my variable sets. I have been following the process Gonzalez et al (2008) go through in their paper. However, I get to an error message Error in chol.default(Bmat) : the leading minor of order 12 is not positive definite and I do not know what it means or what to do about it. Here is the code, and any ideas or knowledge on the subject would be appreciated.library(CCA) correl <- matcor(X, Y) img.matcor(correl, type = 2) res.regul <- estim.regul(X, Y, plt = TRUE, grid1 = seq(0.0001, 0.2, l=51), grid2 = seq(0, 0.2, l=51))
Error in chol.default(Bmat) : the leading minor of order 12 is not positive definite Dr. Mario E. Suárez Mota Quien no me mata me hace más fuerte... > Date: Tue, 15 Jul 2014 08:37:22 +0200 > From: [email protected] > To: [email protected] > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a > list, of species from GenBank > > Hi Yuxin, > > you can also try out the phyloGenerator tool to download specific > sequences from GenBank > (http://willpearse.github.io/phyloGenerator/index.html). You only need a > species list to do so, for more details read the manual. > > Best, > > Eugen > > Am 14.07.2014 12:00, schrieb [email protected]: > > Send R-sig-phylo mailing list submissions to > > [email protected] > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > or, via email, send a message with subject or body 'help' to > > [email protected] > > > > You can reach the person managing the list at > > [email protected] > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of R-sig-phylo digest..." > > > > > > Today's Topics: > > > > 1. use R to download the DNA barcode sequence of a list of > > species from GenBank ([email protected]) > > 2. Re: use R to download the DNA barcode sequence of a list of > > species from GenBank (Karolis Ramanauskas) > > 3. Fw: Re: use R to download the DNA barcode sequence of a list > > of species from GenBank ([email protected]) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 13 Jul 2014 23:39:13 +0800 > > From: "[email protected]" <[email protected]> > > To: r-sig-phylo <[email protected]> > > Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a > > list of species from GenBank > > Message-ID: <[email protected]> > > Content-Type: text/plain > > > > Dear All, > > > > I have a list of about 2,000 plant species, and want to construct a > > phylogenetic tree for them. I'd like to use the DNA barcode data availabe > > in GenBank. Then I will first need to download these DNA sequences from the > > Internet. I know that read.GenBank in package "ape" is capable to do it if > > I have the GenBank accession numbers. But what I only have now is their > > species names. Does anybody know which R function can batch-process it with > > only species names from GenBank? > > > > Many thanks in advance. > > Yuxin > > > > > > > > Yuxin Chen > > Phd Candidate > > School of Life Sciences > > Sun Yat-sen University > > Guangzhou, P. R. China, 510006 > > [email protected] or [email protected] > > > > [[alternative HTML version deleted]] > > > > > > ------------------------------ > > > > Message: 2 > > Date: Sun, 13 Jul 2014 12:31:03 -0500 > > From: Karolis Ramanauskas <[email protected]> > > To: [email protected], [email protected] > > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence > > of a list of species from GenBank > > Message-ID: > > <CACT_pJHyfaTqX=Ft3g+g=MZ85jxf=_xf_ame4ynkwcyws+j...@mail.gmail.com> > > Content-Type: text/plain > > > > Good day, > > > > I understand you have done some work already, but you may want to try my > > PhyloMill pipeline. It will do exactly what you need. It is written in > > Python, not R. You will need to give it the names of ingroup and outgroup > > taxa and which loci you want to use. If the loci you want to use are not > > predefined in PhyloMill, I can create the definitions, just let me know > > which loci you want to use. > > > > PhyloMill will actually do a lot more than just download and align > > sequences, it will filter mislabeled sequences, reverse-complement if > > needed, etc. It will also create a consensus sequence when multiple GI > > accessions are available for that taxon and locus. > > > > https://github.com/karolisr/krpy > > > > Peace, > > Karolis Ramanauskas > > Department of Biological Sciences > > University of Illinois at Chicago > > 840 W. Taylor St. SEL 4093 M/C 067 > > Chicago, IL 60607 > > E-Mail: [email protected] > > > >> From: "[email protected]" <[email protected]> > >> Subject: [R-sig-phylo] use R to download the DNA barcode sequence of a > > list of species from GenBank > >> Date: July 13, 2014 10:39:13 AM CDT > >> To: r-sig-phylo <[email protected]> > >> > >> Dear All, > >> > >> I have a list of about 2,000 plant species, and want to construct a > > phylogenetic tree for them. I'd like to use the DNA barcode data availabe > > in GenBank. Then I will first need to download these DNA sequences from the > > Internet. I know that read.GenBank in package "ape" is capable to do it if > > I have the GenBank accession numbers. But what I only have now is their > > species names. Does anybody know which R function can batch-process it with > > only species names from GenBank? > >> > >> Many thanks in advance. > >> Yuxin > >> > >> > >> > >> Yuxin Chen > >> Phd Candidate > >> School of Life Sciences > >> Sun Yat-sen University > >> Guangzhou, P. R. China, 510006 > >> [email protected] or [email protected] > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> R-sig-phylo mailing list - [email protected] > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> Searchable archive at > > http://www.mail-archive.com/[email protected]/ > > > > [[alternative HTML version deleted]] > > > > > > > > ------------------------------ > > > > Message: 3 > > Date: Mon, 14 Jul 2014 14:19:29 +0800 > > From: "[email protected]" <[email protected]> > > To: kraman2 <[email protected]> > > Cc: R-sig-phylo <[email protected]> > > Subject: [R-sig-phylo] Fw: Re: use R to download the DNA barcode > > sequence of a list of species from GenBank > > Message-ID: <[email protected]> > > Content-Type: text/plain > > > > Hi Karolis, > > > > Thank you for providing the guides. > > > > David's "rentrez" R package works quite well with my problem (I have copied > > his reply below), and I am not familiar with Python. But thank you all the > > same. > > > > Cheers, > > Yuxin > > > > > > > > Yuxin Chen > > Phd Candidate > > School of Life Sciences > > Sun Yat-sen University > > Guangzhou, P. R. China, 510006 > > [email protected] or [email protected] > > > > From: [email protected] > > Date: 2014-07-14 14:10 > > To: David Winter > > Subject: Re: Re: [R-sig-phylo] use R to download the DNA barcode sequence > > of a list of species from GenBank > > Hi David, > > > > Thanks for your reply. > > > > The package "rentrez" is really wonderful. I have already tried to search > > my species list with your function fetch_gene and it worked. > > But there is one small question. What is the "BOLD[all]" for? Sorry, I am > > just a beginner on phylogeny and am not familar with this term yet. When I > > included it in my case, the search result is empty, but when I exluded this > > term but keeping others the same it worked. > > > > Thank you again, > > Yuxin > > > > > > > > Yuxin Chen > > Phd Candidate > > School of Life Sciences > > Sun Yat-sen University > > Guangzhou, P. R. China, 510006 > > [email protected] or [email protected] > > > > From: David Winter > > Date: 2014-07-14 02:52 > > To: [email protected] > > Subject: Re: [R-sig-phylo] use R to download the DNA barcode sequence of a > > list of species from GenBank > > Hi Yuxin, > > > > If you want specifically to get at the Barcode of Life records in > > genbank then you can try using the NCBI's Entrez tools > > (http://www.ncbi.nlm.nih.gov/books/NBK25500/). If you want to do it in > > R you can use rentrez, a library that I maintain as part of rOpenSci > > (https://github.com/ropensci/rentrez) > > > > Taking a quick look at some of the BOLD records, it seems they are not > > consistently tagged in a way that makes them easy to search for. > > Here's what I came up with for a solution > > > > library(rentrez) > > nuc_search <- entrez_search(db="nuccore", term="Solanum[Organism] > > rbcl[gene] BOLD[all]", retmax=40) > > head(nuc_search$ids) > > ##[1] "409977017" "409977015" "379134037" "379134035" "379133963" > > "326394567" > > > > Those ids can then be passed to read.Genbank or entrez_fetch to > > retrieve records. > > > > If you want to do this for a bunch of genes you might want to wrap the > > whole process up in a function: > > > > fetch_gene <- function(organism_name, gene_name, file_format="fasta", > > max_recs=50){ > > sterm <- sprintf("%s[organism] %s[gene] BOLD[all]", organism_name, > > gene_name) > > nuc_search <- entrez_search(db="nuccore", term=sterm, retmax=max_recs) > > return(entrez_fetch(db="nuccore", id=nuc_search$ids, > > rettype=file_format)) > > } > > genera <- c("Solanum", "Terminalia") > > recs <- lapply(genera, fetch_gene, gene_name="rbcl") > > > > Which will give you a list of characters, each representing a fasta > > file. You can check have the right number of records etc if you want: > > > > library(stringr) > > sapply(recs, str_count, pattern=">") > > > > Almost every language you might otherwise use in bioinformatics has a > > wrapper for the Entrez API, so you easily adapt this to You Favourite > > Language if you wanted to. > > > > Hope that's some help to you > > > > David > > > > On Sun, Jul 13, 2014 at 8:39 AM, [email protected] > > <[email protected]> wrote: > >> Dear All, > >> > >> I have a list of about 2,000 plant species, and want to construct a > >> phylogenetic tree for them. I'd like to use the DNA barcode data availabe > >> in GenBank. Then I will first need to download these DNA sequences from > >> the Internet. I know that read.GenBank in package "ape" is capable to do > >> it if I have the GenBank accession numbers. But what I only have now is > >> their species names. Does anybody know which R function can batch-process > >> it with only species names from GenBank? > >> > >> Many thanks in advance. > >> Yuxin > >> > >> > >> > >> Yuxin Chen > >> Phd Candidate > >> School of Life Sciences > >> Sun Yat-sen University > >> Guangzhou, P. R. China, 510006 > >> [email protected] or [email protected] > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> R-sig-phylo mailing list - [email protected] > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> Searchable archive at > >> http://www.mail-archive.com/[email protected]/ > > > > > > > > > > _______________________________________________ > R-sig-phylo mailing list - [email protected] > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/[email protected]/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/
