Re: [sqlite] indexing large sqlite db

2006-08-15 Thread Joe Wilson
--- Andrew McCollum <[EMAIL PROTECTED]> wrote:
> I am building a very large (> 500M rows, ~6 GB) sqlite database that has
> three integer columns.  I find that inserting all the rows takes only a
> couple hours, but when I try to create an index on any of the columns the
> process will run for days without finishing.
> 
> If the entire operation can be done in RAM, then it is possible for it to
> finish in a few hours, but the second the sqlite process has to go to disk,
> it grinds to a slow crawl.
> 
> Is it possible to split the index operation into multiple phases?  Or better
> yet, flush the data when the cache is full and continue on?  Any other ideas
> on how to speed up the indexing other than reducing the size of the data so
> that the operation can be done in RAM?  ACID restrictions are unimportant
> during this construction phase of the database.

Create the index prior to the data being populated and then insert
a few thousand rows per transaction. Experiment with different sized 
batches. It will run much faster if the rows to be inserted are
pre-sorted in index order.

However, the problem with this incremental approach is that the 
database pages of the index will be scattered across the 6 Gig file 
leading to poor performance if the file is not pre-cached in RAM 
by the OS.


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] indexing large sqlite db

2006-08-15 Thread Andrew McCollum
Hello,

I am building a very large (> 500M rows, ~6 GB) sqlite database that has
three integer columns.  I find that inserting all the rows takes only a
couple hours, but when I try to create an index on any of the columns the
process will run for days without finishing.

If the entire operation can be done in RAM, then it is possible for it to
finish in a few hours, but the second the sqlite process has to go to disk,
it grinds to a slow crawl.

Is it possible to split the index operation into multiple phases?  Or better
yet, flush the data when the cache is full and continue on?  Any other ideas
on how to speed up the indexing other than reducing the size of the data so
that the operation can be done in RAM?  ACID restrictions are unimportant
during this construction phase of the database.

Thanks,
Andrew


PS - Here is what I am trying now:

> PRAGMA cache_size = 4194304
> PRAGMA synchronous = OFF
> PRAGMA temp_store = MEMORY
> CREATE INDEX col1_index ON table (col1)


-
To unsubscribe, send email to [EMAIL PROTECTED]
-