On 14 Nov 2011, at 5:11pm, Gabor Grothendieck wrote:

> The requirement for a large number of columns is actually one thing
> that is often needed when using sqlite from R.  Typically the use case
> is that a user wishes to read a portion of an external file into R and
> that file has thousands of columns.  For example, each row might be an
> individual and each column is a gene.

That would be a text file, right ?  So SQLite isn't involved in that.  You have 
other routines to read text files.

> Or each row is a time point and
> each column is a security (stock, bond, etc.)  The file may too large
> to reasonably handle in memory so rather than deal with it in chunks
> at a time its easier to just read it into sqlite in its entirety and
> then pick off the portion you want into R using sql.

So I actually have R on my Mac and I went and had a look.  You're talking about 
using RSQLite ?  Yes, there are fast ways to move data between a SQLite table 
and an R matrix.  I can see the appeal.  But you can also execute arbitrary SQL 
commands.  So you can write your own import/export routine which takes a very 
wide matrix from R but stores it in a less wide table in SQLite.

What I think you're trying to do is use a SQLite table as a data frame.  If I 
understand correctly, this means you can use the commands you'd normally use 
with an R matrix, but with data still stored in a SQLite table, without having 
to rewrite the code of your program.  It's a really neat hack.

Trouble is, SQLite isn't efficient with such wide tables.  Your code is going 
to operate really slowly.  I'm going to have to get backup from SQLite experts 
here, but I understand that unlike how matrices are stored in R, a table's 
columns are stored in a structure like a linked list.  So if you try to access 
the 1,400th column it has to walk a list of 1,399 items to find it.  Which 
isn't efficient.

So fine.  Use SQLite to store tables as wide as you like.  But write your own 
import/export commands to fetch appropriate parts into memory.  There's no need 
to use a SQLite table 2000 columns wide just because your matrix is 2000 
columns wide.  Purely a recommendation for serious software intended for proper 
use.  Do anything you like in quick hacks: CPU time and memory usage can be 
stupid big for those.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to