Re: [sqlite] Sqlite insertion performance

2007-09-18 Thread drh
John Stanton <[EMAIL PROTECTED]> wrote:
> I haven't looked at how Sqlite manages its cache, so my suggestion may 
> be irrelevant but we have had success in such a cacheing situation by 
> crafting the cache with pages organized in most recently used order. 
> The least recently used pages is the one replaced when there is no cache 
> hit.  When a page is accessed it is linked to the most recently used 
> position.
> 
> B-Tree interior nodes stay in cache.
> 
> Another improvement was achieved by implementing enhanced B-Trees which 
> try to merge adjacent nodes before splitting or deleting a node.  This 
> keeps the trees flatter and less checkerboarded and cuts back on 
> expensive splits when inserting randomly ordered keys.
> 

SQLite already uses an LRU cache.  And, besides, I think the
problem is in the operating system file cache, not in SQLite's
secondary cache.  The correction solution, I believe, is the
4th bullet under "Code" on 

   http://www.sqlite.org/cvstrac/wiki?p=ToDo

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-18 Thread John Stanton

[EMAIL PROTECTED] wrote:

"Paul Harris" <[EMAIL PROTECTED]> wrote:



A thought along the same lines, can sqlite create a unique index that
is hash-based?  this would provide the UNIQUE support, but it wouldn't
provide a sorted index.

That should resolve the massive-insert-too-slow problem, and
afterwards he can create a sorted index on the column if he needs
ordered lookups.




SQLite *could* create a hash index.  But I seriously doubt
that would do anything to solve this problem.  In fact, it
would probably make the problem worse.  The problem arises
from the lose of cache locality.  The size of the database
has grown to the point where the disk cache can no longer
hold the entire database file.  Each insert must check and
update random pages within the database file.  Because the
cache cannot hold the whole database, accessing the pages
that must be checked and updated involves real disk I/O
rather than just a copy of a page out of cache memory.
Doing real disk I/O takes much, much longer than copying
memory out of cache.

The way to fix this problem is to improve the locality of
reference and thus get the cache working better.  This can
be done using B-Trees.  I just haven't found the time to do
it yet.  But because hashing is pseudo-random, hashing actually
makes locality or reference worse and makes the problem much
harder to solve.

--
D. Richard Hipp <[EMAIL PROTECTED]>

I haven't looked at how Sqlite manages its cache, so my suggestion may 
be irrelevant but we have had success in such a cacheing situation by 
crafting the cache with pages organized in most recently used order. 
The least recently used pages is the one replaced when there is no cache 
hit.  When a page is accessed it is linked to the most recently used 
position.


B-Tree interior nodes stay in cache.

Another improvement was achieved by implementing enhanced B-Trees which 
try to merge adjacent nodes before splitting or deleting a node.  This 
keeps the trees flatter and less checkerboarded and cuts back on 
expensive splits when inserting randomly ordered keys.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-18 Thread drh
"Paul Harris" <[EMAIL PROTECTED]> wrote:
> 
> 
> A thought along the same lines, can sqlite create a unique index that
> is hash-based?  this would provide the UNIQUE support, but it wouldn't
> provide a sorted index.
> 
> That should resolve the massive-insert-too-slow problem, and
> afterwards he can create a sorted index on the column if he needs
> ordered lookups.
> 

SQLite *could* create a hash index.  But I seriously doubt
that would do anything to solve this problem.  In fact, it
would probably make the problem worse.  The problem arises
from the lose of cache locality.  The size of the database
has grown to the point where the disk cache can no longer
hold the entire database file.  Each insert must check and
update random pages within the database file.  Because the
cache cannot hold the whole database, accessing the pages
that must be checked and updated involves real disk I/O
rather than just a copy of a page out of cache memory.
Doing real disk I/O takes much, much longer than copying
memory out of cache.

The way to fix this problem is to improve the locality of
reference and thus get the cache working better.  This can
be done using B-Trees.  I just haven't found the time to do
it yet.  But because hashing is pseudo-random, hashing actually
makes locality or reference worse and makes the problem much
harder to solve.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-17 Thread Paul Harris
On 17/09/2007, John Machin <[EMAIL PROTECTED]> wrote:
> On 17/09/2007 1:07 PM, Joe Wilson wrote:
> > --- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
> > I have been struggling with the performance of insertion in sqlite.
> >
> > Here we have a very simple case :
> >
> > A table with an integer autoincrement primary key and a text
> > field that is
> > unique.
> >
> > CREATE TABLE my (id PRIMARY KEY, url);
> >
> > CREATE UNIQUE INDEX myurl ON my(url);
> >
> >
> > My application requires inserting up to 10 million records in
> > batches of
> > 20 thousand records.
> >> For each group of 2 records, first insert them into a TEMP table.
> >> Call the temp table t1.  Then transfer the records to the main table
> >> as follows:
> >>
> >>   INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;
> >
> > I had no performance improvement with that temp store staging table
> > technique in my testing - actually it was slower.
> >
> >   http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html
> >
> > Mind you, the table I was testing against had 4 indexes, whereas the above
> > table has 2. I also wasn't using "OR IGNORE". There might be a difference.
> >
> > Just setting pragma cache_size to a huge value and inserting into
> > the table normally in large batches resulted in better performance in
> > my case. It may have already been mentioned, but having a big
> > database page_size value helps minimize the disk writes as well.
>
> A couple of thoughts:
>
> OTTOMH, time to search index is approx O(D * log(K)) where D = depth of
> tree and K = number of keys per block, and K ** D is O(N)
>
> So:
>
> (1) Big block means big K and thus small D
>
> (2) Long keys (like URLs!!) means small K and thus big D
>
> Further on point (2), the OP seems unsure about whether his URLs are
> unique or not. Perhaps storing another column containing a 64-bit hash
> with an index on that column might be the way to go -- shorter key might
> might well outweigh the extra cost of checking for duplicates.
>


A thought along the same lines, can sqlite create a unique index that
is hash-based?  this would provide the UNIQUE support, but it wouldn't
provide a sorted index.

That should resolve the massive-insert-too-slow problem, and
afterwards he can create a sorted index on the column if he needs
ordered lookups.



Alternatively, he can go without the UNIQUE index on the initial
inserts, and delete the duplicates later.  off top of head, it might
go something like:

create index (not unique)
select t2.id from table as t1, table as t2 where t1.url = t2.url and
t1.id < t2.id;

that would give you a list of the row ids that have repeated an
earlier row's url.
then just build a id list (comma separated) and

delete from table where id in (id list);
NB: you can do the select and delete step in some DBs, not sure about sqlite.

and THEN drop the above index, and create the final unique index.

see ya

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-17 Thread John Machin

On 17/09/2007 1:07 PM, Joe Wilson wrote:

--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:

I have been struggling with the performance of insertion in sqlite.

Here we have a very simple case :

A table with an integer autoincrement primary key and a text  
field that is

unique.

CREATE TABLE my (id PRIMARY KEY, url);

CREATE UNIQUE INDEX myurl ON my(url);


My application requires inserting up to 10 million records in  
batches of

20 thousand records.

For each group of 2 records, first insert them into a TEMP table.
Call the temp table t1.  Then transfer the records to the main table
as follows:

  INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;


I had no performance improvement with that temp store staging table
technique in my testing - actually it was slower.

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html

Mind you, the table I was testing against had 4 indexes, whereas the above 
table has 2. I also wasn't using "OR IGNORE". There might be a difference.


Just setting pragma cache_size to a huge value and inserting into
the table normally in large batches resulted in better performance in 
my case. It may have already been mentioned, but having a big 
database page_size value helps minimize the disk writes as well.


A couple of thoughts:

OTTOMH, time to search index is approx O(D * log(K)) where D = depth of 
tree and K = number of keys per block, and K ** D is O(N)


So:

(1) Big block means big K and thus small D

(2) Long keys (like URLs!!) means small K and thus big D

Further on point (2), the OP seems unsure about whether his URLs are 
unique or not. Perhaps storing another column containing a 64-bit hash 
with an index on that column might be the way to go -- shorter key might 
might well outweigh the extra cost of checking for duplicates.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-16 Thread Joe Wilson
make that:

pragma default_cache_size=[half your machine RAM in bytes / page_size];

--- Joe Wilson <[EMAIL PROTECTED]> wrote:
> You don't have to recompile. 
> Just make a new database file with:
> 
> pragma page_size=32768;
> pragma default_cache_size=[your machine RAM in bytes / 32768];
> 
> --- RaghavendraK 70574 <[EMAIL PROTECTED]> wrote:
> > But there is a limit (3.4.0) which stops at 32KB.
> > the compile macro would do?



   

Pinpoint customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-16 Thread Joe Wilson
You don't have to recompile. 
Just make a new database file with:

pragma page_size=32768;
pragma default_cache_size=[your machine RAM in bytes / 32768];

--- RaghavendraK 70574 <[EMAIL PROTECTED]> wrote:
> But there is a limit (3.4.0) which stops at 32KB.
> the compile macro would do?


  

Tonight's top picks. What will you watch tonight? Preview the hottest shows on 
Yahoo! TV.
http://tv.yahoo.com/ 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-16 Thread RaghavendraK 70574
But there is a limit (3.4.0) which stops at 32KB.
the compile macro would do?

regards
ragha

**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*

- Original Message -
From: Joe Wilson <[EMAIL PROTECTED]>
Date: Monday, September 17, 2007 12:11 pm
Subject: Re: [sqlite] Sqlite insertion performance

> --- RaghavendraK 70574 <[EMAIL PROTECTED]> wrote:
> > But with large cache there is too much fragmentation observered 
> (pgin/pgout)> over a period of time. Do u see this behaviour?
> 
> Fragmentation is not a function of page cache size, but yes,
> I also see this fragmentation if the secondary index exists before 
> the inserts. If you create the index after your inserts, you will 
> have far less (or no) fragmentation.
> 
> See the merge-sort point in:
> 
>  http://www.sqlite.org/cvstrac/wiki?p=ToDo
> 
> > My CacheSize: 32KB, PgSize: 8KB (to the limits)
> 
> Try a page size of 32768 and a value of half your RAM for cache size.
> 
> 
>   
> 
> Need a vacation? Get great deals
> to amazing places on Yahoo! Travel.
> http://travel.yahoo.com/
> 
> ---
> --
> To unsubscribe, send email to [EMAIL PROTECTED]
> ---
> --
> 
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-16 Thread Joe Wilson
--- RaghavendraK 70574 <[EMAIL PROTECTED]> wrote:
> But with large cache there is too much fragmentation observered (pgin/pgout)
> over a period of time. Do u see this behaviour?

Fragmentation is not a function of page cache size, but yes,
I also see this fragmentation if the secondary index exists before 
the inserts. If you create the index after your inserts, you will 
have far less (or no) fragmentation.

See the merge-sort point in:

  http://www.sqlite.org/cvstrac/wiki?p=ToDo

> My CacheSize: 32KB, PgSize: 8KB (to the limits)

Try a page size of 32768 and a value of half your RAM for cache size.


   

Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-16 Thread RaghavendraK 70574
But with large cache there is too much fragmentation observered (pgin/pgout)
over a period of time. Do u see this behaviour?

My CacheSize: 32KB, PgSize: 8KB (to the limits)

regards
ragha

**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*

- Original Message -
From: Joe Wilson <[EMAIL PROTECTED]>
Date: Monday, September 17, 2007 11:07 am
Subject: Re: [sqlite] Sqlite insertion performance

> --- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
> > >>>
> > >>> I have been struggling with the performance of insertion in 
> sqlite.> >>>
> > >>> Here we have a very simple case :
> > >>>
> > >>> A table with an integer autoincrement primary key and a text 
> 
> > >>> field that is
> > >>> unique.
> > >>>
> > >>> CREATE TABLE my (id PRIMARY KEY, url);
> > >>>
> > >>> CREATE UNIQUE INDEX myurl ON my(url);
> > >>>
> > >>>
> > >>> My application requires inserting up to 10 million records 
> in  
> > >>> batches of
> > >>> 20 thousand records.
> > 
> > For each group of 2 records, first insert them into a TEMP 
> table.> Call the temp table t1.  Then transfer the records to the 
> main table
> > as follows:
> > 
> >   INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;
> 
> I had no performance improvement with that temp store staging table
> technique in my testing - actually it was slower.
> 
>  http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html
> 
> Mind you, the table I was testing against had 4 indexes, whereas 
> the above 
> table has 2. I also wasn't using "OR IGNORE". There might be a 
> difference.
> Just setting pragma cache_size to a huge value and inserting into
> the table normally in large batches resulted in better performance 
> in 
> my case. It may have already been mentioned, but having a big 
> database page_size value helps minimize the disk writes as well.
> 
> 
> 
>   
> 
> Take the Internet to Go: Yahoo!Go puts the Internet in your 
> pocket: mail, news, photos & more. 
> http://mobile.yahoo.com/go?refer=1GNXIC
> 
> ---
> --
> To unsubscribe, send email to [EMAIL PROTECTED]
> ---
> --
> 
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-16 Thread Joe Wilson
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
> >>>
> >>> I have been struggling with the performance of insertion in sqlite.
> >>>
> >>> Here we have a very simple case :
> >>>
> >>> A table with an integer autoincrement primary key and a text  
> >>> field that is
> >>> unique.
> >>>
> >>> CREATE TABLE my (id PRIMARY KEY, url);
> >>>
> >>> CREATE UNIQUE INDEX myurl ON my(url);
> >>>
> >>>
> >>> My application requires inserting up to 10 million records in  
> >>> batches of
> >>> 20 thousand records.
> 
> For each group of 2 records, first insert them into a TEMP table.
> Call the temp table t1.  Then transfer the records to the main table
> as follows:
> 
>   INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;

I had no performance improvement with that temp store staging table
technique in my testing - actually it was slower.

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html

Mind you, the table I was testing against had 4 indexes, whereas the above 
table has 2. I also wasn't using "OR IGNORE". There might be a difference.

Just setting pragma cache_size to a huge value and inserting into
the table normally in large batches resulted in better performance in 
my case. It may have already been mentioned, but having a big 
database page_size value helps minimize the disk writes as well.



   

Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, 
photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-15 Thread Kefah T. Issa

Dear Kees and Richard,

Much appreciated.


I tried the ordered-urls-insert the results were better, but it is still 
taking progressively longer time as the number of records increases.

A fundamental question to be asked here :

Shouldn't the time complexity (Big-O) of the insert operation be constant?

I even did a third test where the integer primary key is not auto increment; 
the same problem is observed.

Time complexity is clearly constant when there are no indexes at all (except 
for the implicit auto-increment integer primary key)

But otherwise, time taken increases incrementally (if not exponentially) with 
the number of existing records.

Acknowledging my ignorance on the subject; I really don't see a reason why 
this is happening except being a potential bug or performance improvement 
opportunity.

Regards,
- Kefah.



On Saturday 15 September 2007 12:51:02 Kees Nuyt wrote:
> Hi Kefah,
>
> On Sat, 15 Sep 2007 04:43:46 +0300, you wrote:
> >Thanks Kees,
> >
> >In fact using integer instead of string gives very similar results.
> >
> >Dropping the unique index constraint and then creating it again when
> > needed sounds interesting, as insertion without the unique index is
> > satisfactory and constact over time.
>
> Thanks for the feedback.
>
> >I tried this, but got a trivial question :
> >When creating the unique index, sqlite gives me :
> >SQL error: indexed columns are not unique
> >
> >What should be done here?
>
> Apparently the data in the text column is not unique.
> That is the disadvantage of building the index after the
> insertions: the database can't exercise the contraints on your
> data, so you would have to do that yourself, for example by a
> sort --unique step. My second suggestion for speeding things was
> sorting the input data, so now you have two reasons for a sort.
>
> Such a data cleaning step will take considerable time, so time
> gained in inserting may be lost again in preprocessing.
> It might be better to use the database constraints, and live
> with the slow insertions. Your benchmarks will tell you what's
> best.
>
> >Thanks again,
> >- Kefah.
>
> Good luck.
>
> >On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
> >> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
> >> >Dear All,
> >> >
> >> >I have been struggling with the performance of insertion in sqlite.
> >> >
> >> >Here we have a very simple case :
> >> >
> >> >A table with an integer autoincrement primary key and a text field that
> >> > is unique.
> >> >
> >> >CREATE TABLE my (id PRIMARY KEY, url);
> >> >
> >> >CREATE UNIQUE INDEX myurl ON my(url);
> >> >
> >> >
> >> >My application requires inserting up to 10 million records in batches
> >> > of 20 thousand records.
> >> >
> >> >I use :
> >> >A. PRAGMA SYNCHRONOUS = OFF
> >> >B. Prepared Statement
> >> >C. setAutoCommit(false), then to true at the end of the batch.
> >> >
> >> >Using the above, the insertion starts off at a good speed, then
> >> > drastically slows down as more records are inserted.
> >> >
> >> >It goes like this :
> >> >
> >> >The first four inserstions (first 4*20K -> 60K records)
> >> >0took786
> >> >1took944
> >> >2took1001
> >> >3took1303
> >> >
> >> >After the first 1 Million records
> >> >50took2560
> >> >51took1921
> >> >55took1900
> >> >53took3990
> >> >
> >> >After the 2nd million records
> >> >2took22393
> >> >3took16789
> >> >4took29419
> >> >5took13993
> >> >
> >> >By the end of the the 3rd million records the insertion time goes up to
> >> > 30 seconds per 20K records.
> >> >
> >> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
> >> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
> >> > server with raid10 sata-II harddisk.
> >> >
> >> >
> >> >I know I might be stretching sqlite far beyond its limits, I just want
> >> > to verify that there is nothing more that can be done to make a case
> >> > for sqlite in this scenario. I am not sure whats the best next thing
> >> > to do either.
> >> >
> >> >Your feedback and input will be highly appreciated,
> >> >
> >> >- Kefah.
> >>
> >> Most probably the UNIQUE INDEX on the TEXT column is the
> >> culprit.
> >>
> >> My first try would be to create and fill the table first, and
> >> create the UNIQUE INDEX on the TEXT column afterwards.
> >>
> >> The second suggestion would be to INSERT the rows in sorted
> >> order, the sort key being the TEXT column.



-- 
Kefah T. Issa
Manager

>/. freesoft technologies llc
freesoft technologies, LLC.
Cell : +962 777 80 90 50
Office : +962 6 55 23 967
Fax : +962 6 55 61 967
Jabber IM (XMPP) : [EMAIL PROTECTED] 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-15 Thread D. Richard Hipp


I have been struggling with the performance of insertion in sqlite.

Here we have a very simple case :

A table with an integer autoincrement primary key and a text  
field that is

unique.

CREATE TABLE my (id PRIMARY KEY, url);

CREATE UNIQUE INDEX myurl ON my(url);


My application requires inserting up to 10 million records in  
batches of

20 thousand records.


For each group of 2 records, first insert them into a TEMP table.
Call the temp table t1.  Then transfer the records to the main table
as follows:

 INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;


D. Richard Hipp
[EMAIL PROTECTED]




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-15 Thread Kees Nuyt
Hi Kefah,

On Sat, 15 Sep 2007 04:43:46 +0300, you wrote:

>Thanks Kees,
>
>In fact using integer instead of string gives very similar results.
>
>Dropping the unique index constraint and then creating it again when needed 
>sounds interesting, as insertion without the unique index is satisfactory and 
>constact over time.

Thanks for the feedback.

>I tried this, but got a trivial question : 
>When creating the unique index, sqlite gives me : 
>SQL error: indexed columns are not unique
>
>What should be done here?

Apparently the data in the text column is not unique.
That is the disadvantage of building the index after the
insertions: the database can't exercise the contraints on your
data, so you would have to do that yourself, for example by a
sort --unique step. My second suggestion for speeding things was
sorting the input data, so now you have two reasons for a sort.

Such a data cleaning step will take considerable time, so time
gained in inserting may be lost again in preprocessing.
It might be better to use the database constraints, and live
with the slow insertions. Your benchmarks will tell you what's
best.

>Thanks again,
>- Kefah.

Good luck.

>On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
>> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
>> >Dear All,
>> >
>> >I have been struggling with the performance of insertion in sqlite.
>> >
>> >Here we have a very simple case :
>> >
>> >A table with an integer autoincrement primary key and a text field that is
>> >unique.
>> >
>> >CREATE TABLE my (id PRIMARY KEY, url);
>> >
>> >CREATE UNIQUE INDEX myurl ON my(url);
>> >
>> >
>> >My application requires inserting up to 10 million records in batches of
>> > 20 thousand records.
>> >
>> >I use :
>> >A. PRAGMA SYNCHRONOUS = OFF
>> >B. Prepared Statement
>> >C. setAutoCommit(false), then to true at the end of the batch.
>> >
>> >Using the above, the insertion starts off at a good speed, then
>> > drastically slows down as more records are inserted.
>> >
>> >It goes like this :
>> >
>> >The first four inserstions (first 4*20K -> 60K records)
>> >0took786
>> >1took944
>> >2took1001
>> >3took1303
>> >
>> >After the first 1 Million records
>> >50took2560
>> >51took1921
>> >55took1900
>> >53took3990
>> >
>> >After the 2nd million records
>> >2took22393
>> >3took16789
>> >4took29419
>> >5took13993
>> >
>> >By the end of the the 3rd million records the insertion time goes up to 30
>> >seconds per 20K records.
>> >
>> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
>> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
>> > server with raid10 sata-II harddisk.
>> >
>> >
>> >I know I might be stretching sqlite far beyond its limits, I just want to
>> >verify that there is nothing more that can be done to make a case for
>> > sqlite in this scenario. I am not sure whats the best next thing to do
>> > either.
>> >
>> >Your feedback and input will be highly appreciated,
>> >
>> >- Kefah.
>>
>> Most probably the UNIQUE INDEX on the TEXT column is the
>> culprit.
>>
>> My first try would be to create and fill the table first, and
>> create the UNIQUE INDEX on the TEXT column afterwards.
>>
>> The second suggestion would be to INSERT the rows in sorted
>> order, the sort key being the TEXT column.
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-14 Thread Kefah T. Issa
Thanks Kees,

In fact using integer instead of string gives very similar results.

Dropping the unique index constraint and then creating it again when needed 
sounds interesting, as insertion without the unique index is satisfactory and 
constact over time.

I tried this, but got a trivial question : 
When creating the unique index, sqlite gives me : 
SQL error: indexed columns are not unique

What should be done here?

Thanks again,
- Kefah.


On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
> >Dear All,
> >
> >I have been struggling with the performance of insertion in sqlite.
> >
> >Here we have a very simple case :
> >
> >A table with an integer autoincrement primary key and a text field that is
> >unique.
> >
> >CREATE TABLE my (id PRIMARY KEY, url);
> >
> >CREATE UNIQUE INDEX myurl ON my(url);
> >
> >
> >My application requires inserting up to 10 million records in batches of
> > 20 thousand records.
> >
> >I use :
> >A. PRAGMA SYNCHRONOUS = OFF
> >B. Prepared Statement
> >C. setAutoCommit(false), then to true at the end of the batch.
> >
> >Using the above, the insertion starts off at a good speed, then
> > drastically slows down as more records are inserted.
> >
> >It goes like this :
> >
> >The first four inserstions (first 4*20K -> 60K records)
> >0took786
> >1took944
> >2took1001
> >3took1303
> >
> >After the first 1 Million records
> >50took2560
> >51took1921
> >55took1900
> >53took3990
> >
> >After the 2nd million records
> >2took22393
> >3took16789
> >4took29419
> >5took13993
> >
> >By the end of the the 3rd million records the insertion time goes up to 30
> >seconds per 20K records.
> >
> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
> > server with raid10 sata-II harddisk.
> >
> >
> >I know I might be stretching sqlite far beyond its limits, I just want to
> >verify that there is nothing more that can be done to make a case for
> > sqlite in this scenario. I am not sure whats the best next thing to do
> > either.
> >
> >Your feedback and input will be highly appreciated,
> >
> >- Kefah.
>
> Most probably the UNIQUE INDEX on the TEXT column is the
> culprit.
>
> My first try would be to create and fill the table first, and
> create the UNIQUE INDEX on the TEXT column afterwards.
>
> The second suggestion would be to INSERT the rows in sorted
> order, the sort key being the TEXT column.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-14 Thread Kees Nuyt
On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:

>
>Dear All,
>
>I have been struggling with the performance of insertion in sqlite.
>
>Here we have a very simple case : 
>
>A table with an integer autoincrement primary key and a text field that is 
>unique.
>
>CREATE TABLE my (id PRIMARY KEY, url);
>
>CREATE UNIQUE INDEX myurl ON my(url);
>
>
>My application requires inserting up to 10 million records in batches of 20 
>thousand records.
>
>I use : 
>A. PRAGMA SYNCHRONOUS = OFF
>B. Prepared Statement
>C. setAutoCommit(false), then to true at the end of the batch.
>
>Using the above, the insertion starts off at a good speed, then drastically 
>slows down as more records are inserted.
>
>It goes like this : 
>
>The first four inserstions (first 4*20K -> 60K records)
>0took786
>1took944
>2took1001
>3took1303
>
>After the first 1 Million records 
>50took2560
>51took1921
>55took1900
>53took3990
>
>After the 2nd million records
>2took22393
>3took16789
>4took29419
>5took13993
>
>By the end of the the 3rd million records the insertion time goes up to 30 
>seconds per 20K records.
>
>My app is running from a java code and I am using SqliteJDBC 0.37 (the latest) 
>on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu server with 
>raid10 sata-II harddisk.
>
>
>I know I might be stretching sqlite far beyond its limits, I just want to 
>verify that there is nothing more that can be done to make a case for sqlite 
>in this scenario. I am not sure whats the best next thing to do either.
>
>Your feedback and input will be highly appreciated,
>
>- Kefah.

Most probably the UNIQUE INDEX on the TEXT column is the
culprit. 

My first try would be to create and fill the table first, and
create the UNIQUE INDEX on the TEXT column afterwards.

The second suggestion would be to INSERT the rows in sorted
order, the sort key being the TEXT column.
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-