Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-14 Thread Benilton Carvalho
wow! thanks. On 11/14/07, Kees Nuyt <[EMAIL PROTECTED]> wrote: > > > On Tue, 13 Nov 2007 20:46:06 -0500, "Benilton Carvalho" > <[EMAIL PROTECTED]> wrote: > > >I'm in the process of creating the whole db now and will give it a > >try... maybe creation is the worse part, as I'm going to be accessing

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-14 Thread Kees Nuyt
On Tue, 13 Nov 2007 20:46:06 -0500, "Benilton Carvalho" <[EMAIL PROTECTED]> wrote: >I'm in the process of creating the whole db now and will give it a >try... maybe creation is the worse part, as I'm going to be accessing >contiguous sets of rows (all columns) at a time... something like: > >SEL

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-14 Thread John Stanton
I have only glanced at the problem so I may have missed something but my approach to a large matrix would be to realise it is a flat file and mmap it. Your program would then treat it as a memory resident structure. The VM features of the OS would perform paging as necessary to keep a working

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-14 Thread Andreas
Hello Benilton, some years ago i came across pyTables ( http://www.pytables.org ). It's a wrapper for the HDF5-format. PyTables claims to handle high data-thruput very well. It supports Matrix/Array-formats as these are typically used in scientific-projects. PyTables does not provide any

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
That's what I'm testing right now... thanks a lot for the heads up... b On Nov 14, 2007 1:41 AM, Trevor Talbot <[EMAIL PROTECTED]> wrote: > On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > > > Then each column is added on the fly, using: > > > > ALTER TABLE table ADD COLUMN colunm REAL;

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Trevor Talbot
On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > Then each column is added on the fly, using: > > ALTER TABLE table ADD COLUMN colunm REAL; > > The columns in then filled used UPDATE as described previously. I haven't investigated the storage layout of this, but from what the docs impl

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
It appears my storage bandwidth is awesome if I had only 10 columns (as my toy example had). My script creates the table and adds values to the primary key field. Then each column is added on the fly, using: ALTER TABLE table ADD COLUMN colunm REAL; The columns in then filled used UPDATE as

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Trevor Talbot
On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > I have "fixed" my problem, working late night and not sleeping usually > causes that: I missed a WHERE on the UPDATE statement, so I could > modify multiple rows at a time. Basically what I did was create a > primary key (integer), which

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
Unfortunately the matrix is not sparse nor the direction of operations can be changed. The reason is that I have 2K samples, which are given in one file each (2K binary files). Each of the files has roughly 7M numbers, which I process by running some algorithms I have, reducing from 7M to 1M eleme

RE: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Griggs, Donald
Regarding:building a 100 row x 2000 column matrix If by chance these are sparse matrices (i.e., the majority of the values are empty) then conceivably you could store only the available values. The schema might then be something like: CREATE TABLE myTable ( matrixRow INTEGER,

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Trevor Talbot
On 11/13/07, John Stanton <[EMAIL PROTECTED]> wrote: > The point is that the more statements you have in each transaction the > better, up to a limit of perhaps a thousand or so. Actually, I get the impression that number of statements is not really the problem here. On 11/13/07, Benilton Carva

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread John Stanton
The point is that the more statements you have in each transaction the better, up to a limit of perhaps a thousand or so. Benilton Carvalho wrote: I'm using RSQLite functions like: dbBeginTransaction(db2) dbGetPreparedQuery(db2, sql, bind.data=df) dbCommit(db2) where db2 is the connection to

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
I'm using RSQLite functions like: dbBeginTransaction(db2) dbGetPreparedQuery(db2, sql, bind.data=df) dbCommit(db2) where db2 is the connection to the SQLite db, sql is either the INSERT or UPDATE statements I mentioned and df is the R object (data.frame) containing the data to be inserted/updated

Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread John Stanton
How are you using transactions? Benilton Carvalho wrote: Hi Everyone, I use R to create tools for analysis of microarrays (http://www.bioconductor.org). I'm in a situation where I need to handle 6 matrices with 1M rows and about 2K columns, ie each matrix takes roughly 15GB RAM. The procedure

[sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
Hi Everyone, I use R to create tools for analysis of microarrays (http://www.bioconductor.org). I'm in a situation where I need to handle 6 matrices with 1M rows and about 2K columns, ie each matrix takes roughly 15GB RAM. The procedure I'm working on can be divided in two parts: 1) I read an i