[sqlite] Appropriate Uses For SQLite

Jim Callahan Wed, 25 Feb 2015 16:16:41 -0500

This might give an impression of the scale of what the BioConductor people
are doing.


"The Gene Expression Omnibus (GEO) at the National Center for Biotechnology
Information (NCBI) is the largest fully public repository [as of 2005] for
high-throughput molecular abundance data, primarily gene expression data."
http://www.ncbi.nlm.nih.gov/pubmed/15608262

"The NCBI Gene Expression Omnibus (GEO) represents the largest public
repository of microarray data. However, finding data in GEO can be
challenging. We have developed GEOmetadb in an attempt to make querying the
GEO metadata both easier and more powerful. All GEO metadata records as
well as the relationships between them are parsed and stored in a local
MySQL database. ... In addition, a Bioconductor package, GEOmetadb that
utilizes a SQLite export of the entire GEOmetadb database is also
available, rendering the entire GEO database accessible with full power of
SQL-based queries from within R."
http://www.ncbi.nlm.nih.gov/pubmed/18842599

Annotation Database Interface

Bioconductor version: Release (3.0)

Provides user interface and database connection code for annotation data
packages using SQLite data storage.

Author: Herve Pages, Marc Carlson, Seth Falcon, Nianhua Li

Maintainer: Bioconductor Package Maintainer <maintainer at bioconductor.org>

Citation (from within R, enter citation("AnnotationDbi")):

Pages H, Carlson M, Falcon S and Li N. *AnnotationDbi: Annotation Database
Interface*. R package version 1.28.1.

http://master.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html

To really understand the enormity of what they attempting, you need a
picture like the one "Figure 1: Annotation Packages: the big picture" on
the first page of this document:
http://master.bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf

Just to grasp the scale and complexity of what they are doing; one of the
databases mentioned GO.db stores a gigantic directed acyclic graph (DAG).

"GOBPANCESTOR Annotation of GO Identifiers to their Biological Process
Ancestors Description This data set describes associations between GO
Biological Process (BP) terms and their ancestor BP terms, based on the
directed acyclic graph (DAG) defined by the Gene Ontology Consortium. The
format is an R object mapping the GO BP terms to all ancestor terms, where
an ancestor term is a more general GO term that precedes the given GO term
in the DAG (in other words, the parents, and all their parents, etc.)."

I get the idea that they are storing a DAG in a SQLite database for use in
R, explaining "associations between GO Biological Process (BP) terms and
their ancestor BP terms, based on the directed acyclic graph (DAG) defined
by the Gene Ontology Consortium."

DAG, SQLite, R, Biological Processes and Gene Ontology in one paragraph;
oh, my head hurts, I think I'll stick to simpler stuff.

Jim





On Wed, Feb 25, 2015 at 3:13 PM, Jim Callahan <
jim.callahan.orlando at gmail.com> wrote:

> I first learned about SQLite in the Bioconductor branch of R. I figured if
> they could handle massive genetic databases in SQLite, SQLite ought to be
> able to handle a million (or even 12 million) voters in a voter file.
>
> Here is a brief article from 2006, "How to Use SQLite with R" by Seth
> Falcon.
>
> http://master.bioconductor.org/help/course-materials/2006/rforbioinformatics/labs/thurs/SQLite-R-howto.pdf
> Jim
>
> On Thu, Feb 19, 2015 at 2:08 PM, Jim Callahan <
> jim.callahan.orlando at gmail.com> wrote:
>
>> Strongly agree with using the R package Sqldf.
>> I used both RSQLite and Sqldf, both worked extremely well (and I am both
>> a lazy and picky end user). Sqldf had the advantage that it took you all
>> the way to your destination the workhorse R object the data frame (R can
>> define new objects, but the data frame as an in memory table is the
>> default).
>> The SQLITE3 command line interface and the R command line had a nice
>> synergy; SQL was great for getting a subset of rows and columns or building
>> a complex view from multiple tables. Both RSqlite and Sqldf could
>> understand the query/view as a table and all looping in both SQL and R took
>> place behind the scenes in compiled code.
>>
>> Smart phone users say "there is an app for that". R users would say
>> "there is a package for that" and CRAN is the equivalent of the Apple app
>> store or Google Play.
>>
>> R has packages for graphics, classical statistics, Bayesian statistics
>> and machine learning. R also has packages for spacial statistics (including
>> reading ESRI shapefiles), for graph theory and for building decision trees.
>> There is another whole app store for biological applications "bioconductor".
>>
>> The CRAN website has "views" (pages or blogs) showing how packages solve
>> common problems in a variety of academic disciplines or application areas.
>>
>> Jim Callahan
>>  On Feb 19, 2015 11:38 AM, "Gabor Grothendieck" <ggrothendieck at gmail.com>
>> wrote:
>>
>>> On Wed, Feb 18, 2015 at 9:53 AM, Richard Hipp <drh at sqlite.org> wrote:
>>> > On 2/18/15, Jim Callahan <jim.callahan.orlando at gmail.com> wrote:
>>> >> I would mention the open source statistical language R in the "data
>>> >> analysis" section.
>>> >
>>> > I've heard of R but never tried to use it myself.  Is an SQLite
>>> > interface built into R, sure enough?  Or is that something that has to
>>> > be added in separately?
>>> >
>>>
>>> RSQLite is an add-on package to R; however, for data analysis (as
>>> opposed to specific database manipulation) I would think most R users
>>> would use my sqldf R add-on package (which uses RSQLite by default and
>>> also can use driver packages of certain other databases) rather than
>>> RSQLite directly if they were going to use SQL for that.
>>>
>>> In R a data.frame is like an SQL table but in memory and sqldf lets
>>> you apply SQL statements to them as if they were all one big SQLite
>>> database.  A common misconception is it must be slow but in fact its
>>> sufficiently fast that some people use it to get a speed advantage
>>> over plain R.  Others use it to learn SQL or to ease the transition to
>>> R and others use it allow them to manipulate R data frames without
>>> knowing much about R provided they know SQL.
>>>
>>> If you have not tried R this takes you through installing R and
>>> running sqldf in about 5 minutes:
>>> https://sqldf.googlecode.com/#For_Those_New_to_R
>>>
>>> The rest of that page gives many other examples.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users at mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>
>

[sqlite] Appropriate Uses For SQLite

Reply via email to