It appears my storage bandwidth is awesome.... if I had only 10
columns (as my toy example had).

My script creates the table and adds values to the primary key field.

Then each column is added on the fly, using:

ALTER TABLE table ADD COLUMN colunm REAL;

The columns in then filled used UPDATE as described previously.

Maybe you guys don't care and I apologize in case the contents of this
message are not of interest, but the following link demonstrates the
time needed to add one extra column of 1M reals on our system here:

http://www.biostat.jhsph.edu/~bcarvalh/sqlite.png

I totally understand that the amount of time should increase, but I
found it striking to see the jump when I add the 111th column.

I'll try to process and add a few columns at a time as suggested.

Thank you very much,

Kindest regards,

--benilton

On Nov 13, 2007 9:29 PM, Trevor Talbot <[EMAIL PROTECTED]> wrote:
> On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:
>
> > I have "fixed" my problem, working late night and not sleeping usually
> > causes that: I missed a WHERE on the UPDATE statement, so I could
> > modify multiple rows at a time. Basically what I did was create a
> > primary key (integer), which is the "row number of my matrix" and run
> > the UPDATE like:
> >
> > UPDATE table SET column = <values> WHERE id = <ids>;
> >
> > shame on me... and I apologize for missing such basic thing...
>
> No worries, we all miss that one at one time or another.
>
> > But, anyways, the first INSERT takes 4secs for all 1M rows.. the
> > updates take 9secs, each.
>
> I'm impressed with your storage bandwidth.  If you don't need the
> database to be persistent (you restart the whole process in case of
> failure), turning the synchronous pragma off may help your I/O times a
> bit.  SQLite won't try so hard to keep the data safe from
> interruptions that way.
>
> > I'm in the process of creating the whole db now and will give it a
> > try... maybe creation is the worse part, as I'm going to be accessing
> > contiguous sets of rows (all columns) at a time... something like:
> >
> > SELECT * FROM table LIMIT 10000 OFFSET 20000;
>
> If possible, I'd recommend a simple "SELECT * FROM table;" and only
> fetch 10000 at a time.  I don't know what the RSQLite interface is
> like though.  (The reasoning is that OFFSET must look at all the rows
> prior, so each query will take longer than the previous.)  But yes, I
> would expect this to work much faster than the updates, as it should
> be very nearly contiguous on disk.
>
> My only other suggestion at the moment would be to buffer and batch
> updates if possible.  Process a few columns at a time, and update them
> all at once like:
>
>     UPDATE table SET column1 = <value>, column2 = <value> ... WHERE id = <id>;
>
> Each update should take about the same amount of time (9s), but you'll
> get more done each time.
>
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
> -----------------------------------------------------------------------------
>
>

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to