This might give an impression of the scale of what the BioConductor people are doing.
"The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository [as of 2005] for high-throughput molecular abundance data, primarily gene expression data." http://www.ncbi.nlm.nih.gov/pubmed/15608262 "The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. ... In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R." http://www.ncbi.nlm.nih.gov/pubmed/18842599 Annotation Database Interface Bioconductor version: Release (3.0) Provides user interface and database connection code for annotation data packages using SQLite data storage. Author: Herve Pages, Marc Carlson, Seth Falcon, Nianhua Li Maintainer: Bioconductor Package Maintainer <maintainer at bioconductor.org> Citation (from within R, enter citation("AnnotationDbi")): Pages H, Carlson M, Falcon S and Li N. *AnnotationDbi: Annotation Database Interface*. R package version 1.28.1. http://master.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html To really understand the enormity of what they attempting, you need a picture like the one "Figure 1: Annotation Packages: the big picture" on the first page of this document: http://master.bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf Just to grasp the scale and complexity of what they are doing; one of the databases mentioned GO.db stores a gigantic directed acyclic graph (DAG). "GOBPANCESTOR Annotation of GO Identifiers to their Biological Process Ancestors Description This data set describes associations between GO Biological Process (BP) terms and their ancestor BP terms, based on the directed acyclic graph (DAG) defined by the Gene Ontology Consortium. The format is an R object mapping the GO BP terms to all ancestor terms, where an ancestor term is a more general GO term that precedes the given GO term in the DAG (in other words, the parents, and all their parents, etc.)." I get the idea that they are storing a DAG in a SQLite database for use in R, explaining "associations between GO Biological Process (BP) terms and their ancestor BP terms, based on the directed acyclic graph (DAG) defined by the Gene Ontology Consortium." DAG, SQLite, R, Biological Processes and Gene Ontology in one paragraph; oh, my head hurts, I think I'll stick to simpler stuff. Jim On Wed, Feb 25, 2015 at 3:13 PM, Jim Callahan < jim.callahan.orlando at gmail.com> wrote: > I first learned about SQLite in the Bioconductor branch of R. I figured if > they could handle massive genetic databases in SQLite, SQLite ought to be > able to handle a million (or even 12 million) voters in a voter file. > > Here is a brief article from 2006, "How to Use SQLite with R" by Seth > Falcon. > > http://master.bioconductor.org/help/course-materials/2006/rforbioinformatics/labs/thurs/SQLite-R-howto.pdf > Jim > > On Thu, Feb 19, 2015 at 2:08 PM, Jim Callahan < > jim.callahan.orlando at gmail.com> wrote: > >> Strongly agree with using the R package Sqldf. >> I used both RSQLite and Sqldf, both worked extremely well (and I am both >> a lazy and picky end user). Sqldf had the advantage that it took you all >> the way to your destination the workhorse R object the data frame (R can >> define new objects, but the data frame as an in memory table is the >> default). >> The SQLITE3 command line interface and the R command line had a nice >> synergy; SQL was great for getting a subset of rows and columns or building >> a complex view from multiple tables. Both RSqlite and Sqldf could >> understand the query/view as a table and all looping in both SQL and R took >> place behind the scenes in compiled code. >> >> Smart phone users say "there is an app for that". R users would say >> "there is a package for that" and CRAN is the equivalent of the Apple app >> store or Google Play. >> >> R has packages for graphics, classical statistics, Bayesian statistics >> and machine learning. R also has packages for spacial statistics (including >> reading ESRI shapefiles), for graph theory and for building decision trees. >> There is another whole app store for biological applications "bioconductor". >> >> The CRAN website has "views" (pages or blogs) showing how packages solve >> common problems in a variety of academic disciplines or application areas. >> >> Jim Callahan >> On Feb 19, 2015 11:38 AM, "Gabor Grothendieck" <ggrothendieck at gmail.com> >> wrote: >> >>> On Wed, Feb 18, 2015 at 9:53 AM, Richard Hipp <drh at sqlite.org> wrote: >>> > On 2/18/15, Jim Callahan <jim.callahan.orlando at gmail.com> wrote: >>> >> I would mention the open source statistical language R in the "data >>> >> analysis" section. >>> > >>> > I've heard of R but never tried to use it myself. Is an SQLite >>> > interface built into R, sure enough? Or is that something that has to >>> > be added in separately? >>> > >>> >>> RSQLite is an add-on package to R; however, for data analysis (as >>> opposed to specific database manipulation) I would think most R users >>> would use my sqldf R add-on package (which uses RSQLite by default and >>> also can use driver packages of certain other databases) rather than >>> RSQLite directly if they were going to use SQL for that. >>> >>> In R a data.frame is like an SQL table but in memory and sqldf lets >>> you apply SQL statements to them as if they were all one big SQLite >>> database. A common misconception is it must be slow but in fact its >>> sufficiently fast that some people use it to get a speed advantage >>> over plain R. Others use it to learn SQL or to ease the transition to >>> R and others use it allow them to manipulate R data frames without >>> knowing much about R provided they know SQL. >>> >>> If you have not tried R this takes you through installing R and >>> running sqldf in about 5 minutes: >>> https://sqldf.googlecode.com/#For_Those_New_to_R >>> >>> The rest of that page gives many other examples. >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users at mailinglists.sqlite.org >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >>> >> >